--- name: do-task description: Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review) in Codex. ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan). --- # Do Task (Codex Native Superpowers) Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify. This is a single-artifact sibling of `create-plan` + `implement-plan`. Unlike `implement-plan`, `do-task` operates on one persistent `task-plan.md` (not a full milestone plan) and defaults to the **current branch** (not a worktree). **Core principle:** Codex uses native skill discovery from `~/.agents/skills/`. Do not use deprecated `superpowers-codex bootstrap` or `use-skill` CLI commands. ## Prerequisite Check (MANDATORY) Required: - Codex CLI: `codex --version` - Superpowers repo: `https://github.com/obra/superpowers` - Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills` - `superpowers:brainstorming` - `superpowers:test-driven-development` - `superpowers:verification-before-completion` - `superpowers:finishing-a-development-branch` - `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree) - Shared reviewer runtime: `~/.codex/skills/reviewer-runtime/run-review.sh` - Telegram notifier helper: `~/.codex/skills/reviewer-runtime/notify-telegram.sh` Verify before proceeding: ```bash test -L ~/.agents/skills/superpowers test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md ``` If any required dependency is missing, stop immediately and return: `Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.` ## Required Skill Invocation Rules - Invoke relevant skills through native discovery (no CLI wrapper). - Announce skill usage explicitly: - `I've read the [Skill Name] skill and I'm using it to [purpose].` - For skills with checklists, track checklist items with `update_plan` todos. - Tool mapping for Codex: - `TodoWrite` -> `update_plan` - `Task` subagents -> unavailable in Codex; do the work directly and state the limitation - `Skill` -> use native skill discovery from `~/.agents/skills/` ## Trigger Phrase Detection **Binding triggers** (always invoke this skill): - `/do-task` - "do this task" - "do task ..." - "execute this task" - "make it so" **Hint trigger** (invoke unless context clearly maps to another skill): - "just do ..." **Escape phrases** (skip the Phase 2 clarifying-question loop): - `--no-questions` - `"just do it:"` - `"just do this:"` - `"no questions:"` **Excluded** (do NOT trigger `do-task`): - "implement this" — reserved for `implement-plan`. **Dropped defaults** (explicitly NOT binding triggers): - "work on ..." - "handle this" - "take care of ..." - "get this done" **Worktree opt-in phrases** (Phase 4 takes the worktree branch): - "in a worktree" - "use a worktree" - "on an isolated branch" - "on a new branch called X" ## Process ### Phase 1: Preflight 1. Verify git repo: `git rev-parse --is-inside-work-tree`. 2. Verify `/ai_plan/` is present in `.gitignore`. If missing: - Append `/ai_plan/` to `.gitignore`. - Commit that infra change immediately with message `chore(gitignore): ignore ai_plan local planning artifacts`. - This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up `aborted` or `failed`. 3. Verify required sub-skills are discoverable under `~/.agents/skills/superpowers/`. Announce each sub-skill before invocation using: `I've read the [Skill Name] skill and I'm using it to [purpose].` ### Phase 2: Parse Prompt and Question 1. Capture the exact user prompt verbatim. 2. Detect trigger phrase (see above) and record which one matched. 3. Detect escape phrase. If set, skip clarifying questions entirely. 4. Apply the ask-first heuristic: - Skip clarifying questions ONLY if ALL are true: - Prompt names a concrete target (file, feature, or function). - Prompt names a concrete outcome (what success looks like). - Prompt has no ambiguous scope (no "and maybe also ..."). - All identifiers in the prompt are resolvable against the codebase. - Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred. - Empty prompt → ask exactly once: "what task?". 5. Invoke `superpowers:brainstorming` via native discovery (`~/.agents/skills/superpowers/brainstorming/SKILL.md`) for any **behavior-changing** task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: `pure-documentation` and `pure-comment-whitespace-rename`. When skipping, record the skip reason in the Interpretation section of `task-plan.md`. ### Phase 3: Configure Reviewer If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with `REVIEWER_CLI=codex`, `REVIEWER_MODEL=gpt-5.4`, and `MAX_ROUNDS=10`. Otherwise, ask: 1. **Which CLI should review both the plan and the implementation?** - `codex` — OpenAI Codex CLI (`codex exec`) - `claude` — Claude Code CLI (`claude -p`) - `cursor` — Cursor Agent CLI (`cursor-agent -p`) - `opencode` — OpenCode CLI (`opencode run`) - `skip` — No external review, proceed with user approval only at each loop. 2. **Which model?** (only if a CLI was chosen) - For `codex`: default `gpt-5.4`, alternatives: `gpt-5.3-codex`, `o4-mini`, `o3`. - For `claude`: default `sonnet`, alternatives: `opus`, `haiku`. - For `cursor`: **run `cursor-agent models` first** to see available models. - For `opencode`: provider-qualified form `/` (e.g., `anthropic/claude-sonnet-4-5`, `openai/gpt-5.4`). Run `opencode models` to list available models. - Accept any model string the user provides. 3. **Max review rounds shared across both loops?** (default: 10) - If the user does not provide a value, set `MAX_ROUNDS=10`. Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phases 5 and 8. Reviewer CLI: `codex`, `claude`, `cursor`, `opencode`, `pi`, or `skip`. If `REVIEWER_CLI=pi`, verify the Pi reviewer binary before entering the review loop: ```bash pi --version ``` For shorthand `pi/`, split only on the first slash when the prefix is exactly `pi`; store the complete remainder in `REVIEWER_MODEL`. Examples: `pi/claude-opus-4-7` -> `claude-opus-4-7`, `pi/anthropic/claude-opus-4-7` -> `anthropic/claude-opus-4-7`, and `pi/openrouter/anthropic/claude-opus-4-7` -> `openrouter/anthropic/claude-opus-4-7`. When `REVIEWER_CLI=pi`, the reviewer model is configured independently from the model running this workflow. If the model/provider is unavailable, surface helper stderr/status and use `pi --list-models [search]` to inspect configured models. ### Phase 4: Initialize Plan Workspace Codex has no plan-mode concept; there is no plan-mode guard here. Steps: 1. Compute slug: `YYYY-MM-DD-` where `` is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only). 2. Compute plan folder: `ai_plan//`. 3. **Resume detection:** If the folder already exists, read `task-plan.md`: - If `Status` is `draft` or `plan-approved` or `implementation-in-progress`: offer to resume, pick a new suffix (`-v2`), or abort. Default is resume. - If `Status` is any terminal value (`pushed`, `local-only`, `aborted-*`, `failed`): offer a new suffix or abort. Default is new suffix. 4. If not resuming, create the folder and write `task-plan.md` from the template at `templates/task-plan.md` (this skill's template folder; falls back to `~/.codex/skills/do-task/templates/task-plan.md` when installed directly). 5. Fill in: - `Metadata` block. - `Prompt` (verbatim). - `Interpretation`, `Assumptions`, `Files`, `Approach`, `TDD Approach`, `Acceptance Criteria`, `Verification`, `Rollback`. - Leave `Runtime State`, `Review History`, `Final Status` empty (skill updates these). 6. Set `Status: draft`. **Worktree branch:** If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke `superpowers:using-git-worktrees` via native discovery before proceeding. Otherwise continue on the current branch. ### Phase 5: Plan Review Loop If `REVIEWER_CLI=skip`, present `task-plan.md` to the user and proceed only after explicit user approval. Otherwise, invoke the Review Loop (Shared Subroutine) with: ```text REVIEW_KIND = plan REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) PAYLOAD_PATH = /tmp/do-task-plan-${REVIEW_ID}.md PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT (see below) SESSION_ID_VAR = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID ``` Payload is the current `task-plan.md` **with the `Runtime State` and `Review History` blocks stripped** before writing to `PAYLOAD_PATH`. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections. `PLAN_REVIEW_PROMPT`: ```text Review this task plan for completeness, correctness, and risk. Focus on: 1. Does the plan match the user's prompt? 2. Are all assumptions surfaced? 3. Are acceptance criteria testable? 4. Is the TDD approach appropriate per the TDD Approach section? 5. Are there missing files, risks, or security concerns? Return exactly these sections in order: ## Summary ## Findings ### P0 ### P1 ### P2 ### P3 ## Verdict Rules: - Order findings from highest severity to lowest. - Use `- None.` when a severity has no findings. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`. - `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking. ``` On APPROVED: - Set `Status: plan-approved`. - Append APPROVED row to Review History. - Proceed to Phase 6. On MAX_ROUNDS: - Set `Status: aborted-plan-review`. - Send Telegram summary before stopping. - Ask the user whether to override and proceed, restart, or abort. ### Phase 6: Execute (TDD-first where applicable) Native orchestration — do not invoke `superpowers:executing-plans`. 1. Set `Status: implementation-in-progress`. 2. For every behavior-changing file edit: - Invoke `superpowers:test-driven-development` via native discovery. - Write the failing test first. Run it. Confirm it fails. - Implement the minimal code to make it pass. Run the test. Confirm green. - Do NOT commit yet — a single task commit happens in Phase 9. 3. Auto-skip of TDD is permitted ONLY for tasks classified in `task-plan.md` TDD Approach as: - `pure-documentation` - `pure-comment-whitespace-rename` 4. Any other skip (including `pure-config-addition`) requires explicit user approval recorded in `task-plan.md` with an ISO-8601 timestamp. 5. Update `task-plan.md` after each logical step: add notes to `Approach`, check off `Acceptance Criteria` items as they complete. ### Phase 7: Verification Gate Invoke `superpowers:verification-before-completion` via native discovery. Run the commands listed in the `Verification` section of `task-plan.md`: - Lint (changed files first). - Typecheck. - Tests (targeted first, then broader suite if quick). All must pass. If a command fails: - Fix the issue. - Re-run that command. - Increment `verification_attempts` in Runtime State. If `verification_attempts` exceeds 3 without green: - Set `Status: aborted-verification`. - Send Telegram summary. - Ask the user whether to retry, override, or abort. ### Phase 8: Implementation Review Loop If `REVIEWER_CLI=skip`, present a diff + verification summary to the user and proceed only after explicit user approval. Otherwise, invoke the Review Loop (Shared Subroutine) with: ```text REVIEW_KIND = implementation REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) # distinct from plan-review ID PAYLOAD_PATH = /tmp/do-task-implementation-${REVIEW_ID}.md PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT (see below) SESSION_ID_VAR = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID ``` Payload contents (assembled by the skill): ```markdown # Implementation Review: [Short Title] ## Task Plan (the plan that was approved) ## Changes Made (git diff) ## Verification Output ### Lint ### Typecheck ### Tests ``` `IMPL_REVIEW_PROMPT`: ```text Review this implementation against the task plan. Focus on: 1. Correctness — Does the diff satisfy the Acceptance Criteria? 2. Code quality — Clean, maintainable, no obvious issues? 3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)? 4. Security — Any security concerns introduced? 5. Regressions — Does the diff risk breaking unrelated code? Return exactly these sections in order: ## Summary ## Findings ### P0 ### P1 ### P2 ### P3 ## Verdict Rules: - Order findings from highest severity to lowest. - Use `- None.` when a severity has no findings. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`. - `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking. ``` On APPROVED: - Set `Status: implementation-approved`. - Append APPROVED row to Review History. - Proceed to Phase 9. On MAX_ROUNDS: - Set `Status: aborted-impl-review`. - Send Telegram summary. - Ask the user whether to override and commit anyway, restart, or abort. ### Phase 9: Commit + Push Ask Invoke `superpowers:finishing-a-development-branch` via native discovery. 1. Stage all changed files explicitly (avoid `git add -A`). 2. Single commit with message derived from the task goal: - Format: `(): ` - Example: `feat(auth): add session token rotation` 3. Do NOT push. Update `Status: local-only`. 4. Ask the user: "Push to remote? (yes / no)" - On explicit `yes` → push, then set `Status: pushed`. - Any other response → leave `Status: local-only`. ### Phase 10: Telegram Notification + Finalize Resolve the notifier helper: ```bash TELEGRAM_NOTIFY_RUNTIME=~/.codex/skills/reviewer-runtime/notify-telegram.sh ``` On every terminal outcome (`pushed`, `local-only`, `aborted-*`, `failed`), send a Telegram summary if both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are set: ```bash if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then "$TELEGRAM_NOTIFY_RUNTIME" --message "do-task : " fi ``` Rules: - Telegram is the only supported notification path. - Notification failures are non-blocking but must be surfaced to the user. - Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured. - If Telegram is not configured, state that no Telegram notification was sent. Fill in `Final Status` in `task-plan.md` (include commit hash if any). Do NOT delete the plan folder — it stays as a record. --- ## Review Loop (Shared Subroutine) This subroutine is invoked twice per `do-task` run: once in Phase 5 (`REVIEW_KIND=plan`) and once in Phase 8 (`REVIEW_KIND=implementation`). Separate session IDs are used for each loop so reviewer context never leaks across loops. ### Subroutine Inputs | Variable | Purpose | |----------|---------| | `REVIEW_KIND` | `plan` or `implementation` | | `REVIEW_ID` | 8-char hex (from `uuidgen`); reused across rounds of the same loop | | `PAYLOAD_PATH` | `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` | | `PROMPT_TEMPLATE` | `PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT` | | `REVIEWER_CLI` | `codex` \| `claude` \| `cursor` \| `opencode` \| `pi` | | `REVIEWER_MODEL` | Model name | | `MAX_ROUNDS` | Default 10 | | `SESSION_ID_VAR` | `CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID` | Temp artifact paths (per loop): - `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` — payload - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md` — normalized review text - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json` — raw Cursor/OpenCode JSON (cursor only, plus opencode when `--format json` is used) - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr` - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status` - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` - `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` Resolve the shared helper: ```bash REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh ``` Set helper success-artifact args before writing the command script: ```bash HELPER_SUCCESS_FILE_ARGS=() case "$REVIEWER_CLI" in codex) HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md) ;; cursor) HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json) ;; esac ``` ### Step 1: Write Payload Write the full payload for this round to `PAYLOAD_PATH`. ### Step 1a: Secret Scan (per-payload, no caching) **BEFORE** sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw. Run the secret scan with all of these anchored regexes. Use `grep -En` on the payload file: ```bash SECRET_REGEX_FILE=$(mktemp) cat >"$SECRET_REGEX_FILE" <<'EOF' AKIA[0-9A-Z]{16} "type"\s*:\s*"service_account" (ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,} xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,} xox[abpsr]-[A-Za-z0-9]{10,48} sk-(proj-)?[A-Za-z0-9_-]{20,} sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,} -----BEGIN [A-Z ]+ PRIVATE KEY----- (TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,} eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+ EOF SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true) rm -f "$SECRET_REGEX_FILE" ``` If `SCAN_MATCHES` is non-empty: 1. **Redact the matched text before surfacing** — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: `[REDACTED::-chars]`. Example: a matched AWS key becomes `[REDACTED:aws-access-key:20-chars]`. Keep the file path and line number; they are useful for the user and not secret. 2. Present the redacted match summary to the user using this exact wording: ```text SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N): :: [REDACTED::-chars] ... Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact) ``` Pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`. 3. Wait for user response. 4. On `yes`: record `last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches` in Runtime State, and proceed. 5. On `redact`: ask the user to supply redactions, apply them to `PAYLOAD_PATH`, re-scan (this step), record `last_scan_outcome_${REVIEW_KIND}=redacted-and-approved`. 6. On `no`: stop the loop, set `Status: failed`, send Telegram, return to the user. If `SCAN_MATCHES` is empty, record `last_scan_outcome_${REVIEW_KIND}=clean` and proceed. ### Step 2: Generate Reviewer Command Script Write the reviewer invocation to `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` as a bash script starting with: ```bash #!/usr/bin/env bash set -euo pipefail ``` **If `REVIEWER_CLI` is `pi`:** Fresh call every round (Pi reviewer calls do not use session resume): ```bash pi --no-session --no-skills --no-prompt-templates --no-extensions --no-context-files \ --model "$REVIEWER_MODEL" \ --tools read,grep,find,ls \ -p "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review. Return exactly the required ## Summary, ## Findings, and ## Verdict structure." ``` **If `REVIEWER_CLI` is `codex`:** Round 1 — fresh `codex exec`: ```bash codex exec \ -m ${REVIEWER_MODEL} \ -s read-only \ -o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \ "Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md. ${PROMPT_TEMPLATE}" ``` Do not capture the Codex session ID yet. After Round 1 completes, extract it from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` (look for `session id: `) and persist it to Runtime State under `${SESSION_ID_VAR}`. Round 2 and later — resume session: ```bash codex exec resume ${SESSION_ID_VAR_VALUE} \ -o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \ "I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md. Changes made: [List specific changes] Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before. Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain." ``` If resume fails, fall back to fresh `codex exec` with prior-round context. **If `REVIEWER_CLI` is `claude`:** Fresh call every round (Claude CLI has no session resume): ```bash claude -p \ "${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload. $(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md) ${PROMPT_TEMPLATE}" \ --model ${REVIEWER_MODEL} \ --strict-mcp-config \ --setting-sources user ``` Where `${ROUND_PREFIX}` is empty for Round 1 and `"You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. "` for subsequent rounds. **If `REVIEWER_CLI` is `cursor`:** Round 1: ```bash cursor-agent -p \ --mode=ask \ --model ${REVIEWER_MODEL} \ --trust \ --output-format json \ "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review. ${PROMPT_TEMPLATE}" \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json ``` Round 2 and later — resume: ```bash cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \ --mode=ask \ --model ${REVIEWER_MODEL} \ --trust \ --output-format json \ "I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md. Changes made: [List specific changes] Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json ``` If resume fails, fall back to fresh `cursor-agent -p`. After the command completes, extract the session id and review text: ```bash CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json) jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md ``` Persist `CURSOR_SID` to Runtime State under `${SESSION_ID_VAR}` on Round 1. **If `REVIEWER_CLI` is `opencode`:** OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in `plan` primary agent (`--agent plan`) for review, which is read-oriented and does not modify files. Session resume is supported via `-s `, but the most reliable pattern for non-interactive review is **fresh call each round** (like `claude`) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in `--format json` output. Round 1 (preferred, fresh call): ```bash opencode run \ -m ${REVIEWER_MODEL} \ --agent plan \ --format json \ "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review. ${PROMPT_TEMPLATE}" \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json ``` Round 2 and later (fresh-call fallback path — recommended default): ```bash opencode run \ -m ${REVIEWER_MODEL} \ --agent plan \ --format json \ "You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points from last review] I've revised. Updated payload is below. $(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md) Changes made: [List specific changes] Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json ``` Optional session-resume path (only if the installed opencode reliably emits a session id in `--format json` output and accepts it back via `-s`): ```bash # Round 2+ with resume opencode run \ -s ${SESSION_ID_VAR_VALUE} \ -m ${REVIEWER_MODEL} \ --agent plan \ --format json \ "I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md. Changes made: [List specific changes] Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json ``` Extract the review body (the JSON stream emits events; the final assistant message contains the review text): ```bash jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \ > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \ || cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md ``` If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as `completed-empty-output` and follow the helper-failure escalation in Step 6. ### Step 3: Run via `run-review.sh` Run the command script through the shared helper when available: ```bash if [ -x "$REVIEWER_RUNTIME" ]; then "$REVIEWER_RUNTIME" \ --command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \ --stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \ --stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \ --status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \ "${HELPER_SUCCESS_FILE_ARGS[@]}" else echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \ >/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \ 2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr fi ``` Run the helper in the foreground and watch live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the `.status` file instead of treating heartbeats as post-hoc-only data. ### Step 4: Promote Reviewer Output + Capture Session ID After the command completes: - `cursor`: already promoted in Step 2 via `jq -r '.result' ...`. Also capture `session_id` if first round. - `codex`: extract `CODEX_SESSION_ID` from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text lives only in `.runner.out`, `cp` it into the `.md` file: ```bash cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md ``` - `claude` or `pi`: promote `.runner.out` into the `.md` file: ```bash cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md ``` - `opencode`: already promoted in Step 2 via `jq` on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to `${SESSION_ID_VAR}`. On Round 1, persist the captured session ID (if any) into `task-plan.md`'s Runtime State under `${SESSION_ID_VAR}`. ### Step 5: Parse Verdict + Update Review History 1. Read `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md`. 2. Append one row to `task-plan.md` Review History: - Timestamp (ISO-8601 UTC). - Loop (`plan` or `implementation`). - Round number. - Verdict (`APPROVED` or `REVISE`). - Summary (first line of the `## Summary` section). 3. Increment `plan_review_round` or `implementation_review_round` in Runtime State. ### Step 6: Branch APPROVED / REVISE / MAX_ROUNDS Verdict rules: - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → exit the subroutine with `APPROVED`. - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if cheap and safe, then exit with `APPROVED`. - **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to revision (see below), then return to Step 1 for the next round. - No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as APPROVED. - Helper state `completed-empty-output` → treat as failed review attempt, surface `.stderr`/`.status`, fix invocation or prompt handling, then retry. - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters. - Round counter ≥ `MAX_ROUNDS` → exit the subroutine with `MAX_ROUNDS`. Caller decides next action per Phase 5 or Phase 8. **Revision:** The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (`P0` → `P1` → `P2` → `P3`). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1. ### Step 7: Liveness Contract (during Step 3) - The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive. - Keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute. - Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues. - Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals. ### Step 8: Cleanup (on successful round exit) ```bash rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \ /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh ``` If the round failed, produced empty output, or reached operator-decision timeout, KEEP `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them. --- ## Resume Semantics 1. Detect existing plan folder by slug at Phase 4. 2. Read `task-plan.md` → `Status`. 3. Decide next action: | Status | Action | |--------|--------| | `draft` | Resume at Phase 5 (plan review) | | `plan-approved` | Resume at Phase 6 (execute) | | `implementation-in-progress` | Resume at Phase 6 (continue execute) | | `implementation-approved` | Resume at Phase 9 (commit + push ask) | | `pushed` \| `local-only` | Ask user: new suffix, abort, or replay for reference only | | `aborted-*` \| `failed` | Offer new suffix or full restart | 4. When resuming, read Runtime State for `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via `codex exec resume`, `cursor-agent --resume`, or `opencode run -s ` as applicable. --- ## Tracker Discipline (MANDATORY) **ALWAYS update `task-plan.md` before/after each phase transition. NEVER proceed with stale state.** Before starting any phase: 1. Update `Status` if it transitions. 2. Update `last_phase_entered` in Runtime State. After completing any phase: 1. Update `Status` if it transitions. 2. Append notes to the relevant section of `task-plan.md`. Review History is append-only. --- ## Execution Workflow Rules - Current branch is the default; worktree is opt-in only. - Do NOT push without explicit "yes". - Secret scan runs **per-payload, no caching** — every round, including revisions. - Review loops use `MAX_ROUNDS=10` by default, shared across both loops. - The task commit is a single commit created in Phase 9; interim WIP commits are NOT created. - The `.gitignore` infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort. --- ## Verification Checklist - [ ] `ai_plan/` exists and `/ai_plan/` is in `.gitignore` - [ ] `task-plan.md` created under `ai_plan/YYYY-MM-DD-/` - [ ] Reviewer CLI + model + `MAX_ROUNDS` configured (or `skip`) - [ ] Secret scan ran on every outbound reviewer payload - [ ] Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped) - [ ] Phase 6 executed TDD-first for all behavior-changing steps (or documented skip) - [ ] Phase 7 verification green before Phase 8 - [ ] Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped) - [ ] Single task commit created locally, no push without explicit yes - [ ] Telegram notification attempted if configured - [ ] `task-plan.md` Final Status filled in --- ## Variant Hardening Notes — Codex - Must use native skill discovery from `~/.agents/skills/` (no CLI wrappers). - Must verify Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills` - Must invoke required sub-skills with explicit announcements before any action. - Must track checklist-driven sub-skills with `update_plan` todos (the Codex equivalent of `TodoWrite`). - `Task` subagents are unavailable in Codex — do the work directly and state the limitation in the plan if a subagent was expected. - Deprecated CLI commands (`superpowers-codex bootstrap`, `use-skill`) must NOT be used. - Helper paths are `~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`. - No plan-mode guard (Codex has no plan-mode concept). ## Common Mistakes - Using deprecated commands like `superpowers-codex bootstrap` or `superpowers-codex use-skill`. - Jumping to Phase 6 execution without running `superpowers:brainstorming` on behavior-changing work. - Asking multiple clarifying questions in a single message. - Forgetting `update_plan` tracking for sub-skill checklists. - Forgetting to create/update `.gitignore` for `/ai_plan/`. - Skipping the per-payload secret scan because "the previous round was clean". - Pushing the task commit without explicit user approval. ## Red Flags — Stop and Correct - You are about to run any `superpowers-codex` command. - You are writing `task-plan.md` before the brainstorming step (when required) completes. - You are pushing commits without user approval. - You did not announce which skill you invoked and why. - You are proceeding to implementation review with failing lint/typecheck/tests. - You are echoing raw secret-scan matches to the user or logs.