Files
ai-coding-skills/docs/DO-TASK.md
T
2026-05-03 21:09:22 -05:00

533 lines
29 KiB
Markdown

# DO-TASK
## Purpose
Execute a single user-supplied prompt end-to-end with **two reviewer loops** (plan review +
implementation review), with TDD-first execution, a pre-implementation verification gate, and a
single task commit — all in one run of the skill. `do-task` is scoped to small-to-medium ad-hoc
tasks; for multi-milestone work use `create-plan` + `implement-plan` instead.
`do-task` persists one plan artifact per run: `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`. The
folder is kept as a record after success (not deleted). Resume is supported via the `Status` enum
and Runtime State fields.
## Requirements
- Git repo with `/ai_plan/` entry in `.gitignore` (the skill adds the entry automatically if
missing and commits it as a separate infra commit).
- Superpowers skills installed from: https://github.com/obra/superpowers
- Required dependencies (vary by variant; see Install below):
- `superpowers:brainstorming` (or `superpowers/brainstorming` for OpenCode)
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
- For Codex, native skill discovery must be configured:
- `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- Cursor can use the Cursor Superpowers plugin cache or manual `.cursor/skills/superpowers/skills`
/ `~/.cursor/skills/superpowers/skills` installs, and `jq` is a hard prerequisite for the
Cursor variant.
- OpenCode can use `~/.agents/skills/superpowers` or `~/.config/opencode/skills/superpowers`.
- Shared reviewer runtime (`run-review.sh`) AND Telegram notifier helper (`notify-telegram.sh`)
must be installed beside agent skills. Both scripts ship under `skills/reviewer-runtime/` in this
repo and must be copied into the per-variant location:
- Codex: `~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- Claude Code: `~/.claude/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- Cursor: `.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}` (repo-local,
preferred) or `~/.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
(global fallback)
- Pi: `.pi/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh}` (repo-local) or
`~/.pi/agent/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh}` (global)
- Variant-specific prerequisites:
- **Claude Code:** `claude --version`, explicit `Skill`-tool invocation of sub-skills.
- **Codex:** `codex --version`; `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills` symlink present.
- **Cursor:** `cursor-agent --version`, `jq --version` (hard prereq), Superpowers available
from the Cursor plugin cache or manual Cursor skill roots.
- **OpenCode:** `opencode --version`; Superpowers available from `~/.agents/skills/superpowers`
or `~/.config/opencode/skills/superpowers`; Phase 1 runs Bootstrap Superpowers Context.
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
Dependency-missing messages are variant-specific:
- **Claude Code:** `Missing dependency: [specific missing item]. Install required Superpowers
skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
- **Codex:** `Missing dependency: [specific missing item]. Install required Superpowers skills
(https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
- **Cursor:** `Missing dependency: [specific missing item]. Install Cursor Agent CLI, jq, and the
Cursor Superpowers plugin or Superpowers skills under .cursor/skills/ or ~/.cursor/skills/,
then retry.`
- **OpenCode:** `Missing dependency: [specific missing item]. Install required OpenCode
Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the
reviewer-runtime helper, then retry.`
- **Pi:** `Missing dependency: [specific missing item]. Install Pi, required Superpowers skills,
and the Pi reviewer-runtime helper, then retry.`
### Reviewer CLI Requirements
The canonical reviewer CLI support matrix is documented in
[REVIEWERS.md](./REVIEWERS.md). One of these CLIs must be installed to drive either of the two
review loops:
| Reviewer CLI | Install | Verify | Read-Only Mode | Session Resume |
|---|---|---|---|---|
| `codex` | `npm install -g @openai/codex` | `codex --version` | `-s read-only` | Yes (`codex exec resume <id>`) |
| `claude` | `npm install -g @anthropic-ai/claude-code` | `claude --version` | `--strict-mcp-config --setting-sources user` | No (fresh call each round) |
| `cursor` | `curl https://cursor.com/install -fsS \| bash` | `cursor-agent --version` (binary: `cursor-agent`; alias `cursor agent` also works) | `--mode=ask` | Yes (`--resume <id>`) |
| `opencode` | `brew install opencode` or your package manager | `opencode --version` | `--agent plan` | Opt-in (`-s <id>`; fresh call is the default) |
| `pi` | Install Pi coding agent | `pi --version`; list models with `pi --list-models [search]` | `--tools read,grep,find,ls` | No (fresh call each round) |
The reviewer CLI is independent of which agent is running the skill — e.g., Claude Code can send
both the plan and the implementation to Codex for review.
**Additional dependency for `cursor` reviewer:** `jq` is required to parse Cursor's JSON output.
Install via `brew install jq` (macOS) or your package manager. Verify: `jq --version`. The cursor
variant of `do-task` makes `jq` a hard prerequisite regardless of which reviewer CLI is selected.
## Install
### Codex
```bash
mkdir -p ~/.codex/skills/do-task
cp -R skills/do-task/codex/* ~/.codex/skills/do-task/
mkdir -p ~/.codex/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.codex/skills/reviewer-runtime/
chmod +x ~/.codex/skills/reviewer-runtime/*.sh
```
### Claude Code
```bash
mkdir -p ~/.claude/skills/do-task
cp -R skills/do-task/claude-code/* ~/.claude/skills/do-task/
mkdir -p ~/.claude/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.claude/skills/reviewer-runtime/
chmod +x ~/.claude/skills/reviewer-runtime/*.sh
```
### OpenCode
```bash
mkdir -p ~/.config/opencode/skills/do-task
cp -R skills/do-task/opencode/* ~/.config/opencode/skills/do-task/
mkdir -p ~/.config/opencode/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.config/opencode/skills/reviewer-runtime/
chmod +x ~/.config/opencode/skills/reviewer-runtime/*.sh
```
### Cursor
Copy into the repo-local `.cursor/skills/` directory (where the Cursor Agent CLI discovers skills):
```bash
mkdir -p .cursor/skills/do-task
cp -R skills/do-task/cursor/* .cursor/skills/do-task/
mkdir -p .cursor/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh .cursor/skills/reviewer-runtime/
chmod +x .cursor/skills/reviewer-runtime/*.sh
```
Or install globally (loaded via `~/.cursor/skills/`):
```bash
mkdir -p ~/.cursor/skills/do-task
cp -R skills/do-task/cursor/* ~/.cursor/skills/do-task/
mkdir -p ~/.cursor/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.cursor/skills/reviewer-runtime/
chmod +x ~/.cursor/skills/reviewer-runtime/*.sh
```
### Pi
Recommended full Pi package install:
```bash
./scripts/install-pi-package.sh --global
# or, for project-local Pi package install
./scripts/install-pi-package.sh --local
```
Manual single-skill Pi install from the package mirror:
```bash
pnpm run sync:pi
mkdir -p .pi/skills/do-task
cp -R pi-package/skills/do-task/* .pi/skills/do-task/
mkdir -p .pi/skills/reviewer-runtime/pi
cp -R skills/reviewer-runtime/pi/* .pi/skills/reviewer-runtime/pi/
chmod +x .pi/skills/reviewer-runtime/pi/*.sh
```
Global manual installs use `~/.pi/agent/skills/do-task/` and `~/.pi/agent/skills/reviewer-runtime/pi/` instead of `.pi/skills/...`.
Pi workflow skills also require Superpowers. See [PI-SUPERPOWERS.md](./PI-SUPERPOWERS.md) and [PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md).
## Verify Installation
Run the per-variant checks for everything the corresponding `SKILL.md` enforces. Each check is
structured: (1) CLI binary version, (2) skill file presence, (3) reviewer-runtime + notifier
helper presence, (4) Superpowers sub-skill discovery, (5) variant-specific extras.
### Codex Verify
```bash
codex --version
test -f ~/.codex/skills/do-task/SKILL.md
test -x ~/.codex/skills/reviewer-runtime/run-review.sh
test -x ~/.codex/skills/reviewer-runtime/notify-telegram.sh
test -L ~/.agents/skills/superpowers
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### Claude Code Verify
```bash
claude --version
test -f ~/.claude/skills/do-task/SKILL.md
test -x ~/.claude/skills/reviewer-runtime/run-review.sh
test -x ~/.claude/skills/reviewer-runtime/notify-telegram.sh
test -f ~/.claude/skills/superpowers/brainstorming/SKILL.md
test -f ~/.claude/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.claude/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.claude/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### OpenCode Verify
```bash
opencode --version
test -f ~/.config/opencode/skills/do-task/SKILL.md
test -x ~/.config/opencode/skills/reviewer-runtime/run-review.sh
test -x ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md || test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### Cursor Verify
```bash
cursor-agent --version
jq --version
test -f .cursor/skills/do-task/SKILL.md || test -f ~/.cursor/skills/do-task/SKILL.md
test -x .cursor/skills/reviewer-runtime/run-review.sh || test -x ~/.cursor/skills/reviewer-runtime/run-review.sh
test -x .cursor/skills/reviewer-runtime/notify-telegram.sh || test -x ~/.cursor/skills/reviewer-runtime/notify-telegram.sh
test -f .cursor/skills/superpowers/skills/brainstorming/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/brainstorming/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/brainstorming/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/test-driven-development/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/test-driven-development/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/test-driven-development/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/verification-before-completion/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/finishing-a-development-branch/SKILL.md' -print -quit 2>/dev/null | grep -q .
```
### Pi Verify
```bash
pi --version
test -f .pi/skills/do-task/SKILL.md || test -f ~/.pi/agent/skills/do-task/SKILL.md
test -x .pi/skills/reviewer-runtime/pi/run-review.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/run-review.sh
test -x .pi/skills/reviewer-runtime/pi/notify-telegram.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/notify-telegram.sh
test -f .pi/skills/superpowers/brainstorming/SKILL.md || test -f ~/.pi/agent/skills/superpowers/brainstorming/SKILL.md || test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f .pi/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.pi/agent/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f .pi/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.pi/agent/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f .pi/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.pi/agent/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
```
## Key Behavior
- Creates one persistent plan artifact at `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`.
- Ensures `/ai_plan/` is in `.gitignore`. If missing, adds it and creates a separate
`chore(gitignore): ignore ai_plan local planning artifacts` commit.
- Parses the user prompt, detects the trigger phrase, and asks 1-3 clarifying questions unless
the prompt already has a concrete target + outcome + unambiguous scope + resolvable identifiers.
- Invokes `superpowers:brainstorming` for any behavior-changing task (feature creation,
non-trivial bug fix, refactor, design decision). The only skip conditions are
`pure-documentation` and `pure-comment-whitespace-rename`.
- Asks which reviewer CLI, model, and max rounds to use (or accepts `skip` for no review).
"Use defaults" maps to `codex / gpt-5.4 / MAX_ROUNDS=10`.
- Runs the plan review loop (Phase 5) before implementation, iterating up to `MAX_ROUNDS`
(default 10) or until the reviewer returns `VERDICT: APPROVED`.
- Executes with TDD-first (Phase 6) via `superpowers:test-driven-development`. Auto-skip
permitted only for `pure-documentation` and `pure-comment-whitespace-rename`; all other skips
(including config-file additions) require explicit user approval, recorded in the TDD Approach
section with an ISO-8601 timestamp.
- Runs lint/typecheck/tests as a **verification gate** (Phase 7) before the implementation review loop.
- Runs the implementation review loop (Phase 8) against the diff + verification output,
iterating up to `MAX_ROUNDS` or until `APPROVED`.
- Scans every outbound reviewer payload for secrets (subroutine step 1a). Per-payload, no caching.
- Creates a **single commit** after the implementation review approves. Does NOT push. Asks the
user for explicit `yes` before any push.
- Defaults to the **current branch**. Worktree only on explicit opt-in (`"in a worktree"`,
`"use a worktree"`, `"on an isolated branch"`, `"on a new branch called X"`).
- Supports resume: detects existing folder by slug and uses `Status` + Runtime State to decide how to re-enter.
- Sends completion notifications through Telegram only when the shared setup in
[TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
## Dual Review Loops
`do-task` runs the reviewer twice per successful run, with separate session IDs so reviewer context never leaks across loops.
1. **Plan review loop (Phase 5)** — payload is the current `task-plan.md` with `Runtime State`
and `Review History` stripped. The reviewer evaluates whether the plan matches the prompt,
whether assumptions are surfaced, whether acceptance criteria are testable, whether the TDD
approach is appropriate, and whether there are missing files/risks/security concerns.
2. **Implementation review loop (Phase 8)** — payload is the approved task plan (without Runtime
State) + `git diff` (unstaged + staged) + verification output (lint, typecheck, tests). The
reviewer evaluates correctness, code quality, test coverage, security, and regression risk.
Both loops share the same 9-step subroutine and the same `MAX_ROUNDS` counter (default 10).
### Subroutine Steps (inside each review loop)
1. Write payload to `/tmp/do-task-<kind>-<REVIEW_ID>.md`.
2. **Secret scan (step 1a)** — per-payload, no caching. See Secret Scan section below.
3. Generate reviewer command script at `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh`.
4. Run via `reviewer-runtime/run-review.sh`.
5. Promote reviewer output and capture the session ID on Round 1; persist it to `task-plan.md`
Runtime State under the loop-specific variable (`CODEX_PLAN_SESSION_ID`,
`CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`,
`OPENCODE_PLAN_SESSION_ID`, or `OPENCODE_IMPL_SESSION_ID`).
6. Parse verdict; append an entry to Review History; bump the round counter.
7. Branch: `APPROVED` → exit, `REVISE` → caller revises and re-enters, `MAX_ROUNDS` → caller decides.
8. Liveness contract: wait while `In progress N` heartbeats arrive from the runner.
9. Cleanup temp artifacts on success.
### Reviewer Output Contract
- `P0` = total blocker
- `P1` = major risk
- `P2` = must-fix before approval
- `P3` = cosmetic / nice to have
- Each severity section uses `- None.` when empty.
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain.
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe.
## Runtime Artifacts
Per review loop (`<kind>` = `plan` or `implementation`):
- `/tmp/do-task-<kind>-<REVIEW_ID>.md` — payload
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.md` — normalized review text
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.json` — raw JSON (cursor always; opencode with `--format json`)
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.stderr` — reviewer stderr
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.status` — helper heartbeat/status log
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.runner.out` — helper-managed stdout
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh` — reviewer command script
Status log lines use this format:
```text
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
```
`in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should
stop waiting if `in-progress` heartbeats continue.
### Persistent Artifact
The one file kept across runs is `ai_plan/<slug>/task-plan.md`. Its `Status` enum drives resume decisions:
| Status | Meaning |
|---|---|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
## Failure Handling
- `completed-empty-output` — the reviewer exited without producing review text; surface
`.stderr` and `.status`, then retry only after diagnosing the cause.
- `needs-operator-decision` — the helper reached hard-timeout escalation; surface `.status`
and decide whether to extend the timeout, abort, or retry with different parameters.
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds
retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
- Verification gate (Phase 7) retries up to 3 times. On exhaustion, `Status` becomes
`aborted-verification` and the user is asked whether to retry, override, or abort.
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller keeps waiting.
## Secret Scan (subroutine step 1a; per-payload; no caching)
Every outbound reviewer payload is scanned **before** being sent to the reviewer CLI. This scan
runs on every round of both loops. No results are cached, because the Phase 8 payload includes
newly-introduced diff content that earlier rounds never saw.
Canonical anchored regex list (10 patterns):
```text
AWS access key: AKIA[0-9A-Z]{16}
GCP service-acct: "type"\s*:\s*"service_account"
GitHub tokens: (ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
Slack tokens: xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
OpenAI API keys: sk-(proj-)?[A-Za-z0-9_-]{20,}
Anthropic API keys: sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
PEM private keys: -----BEGIN [A-Z ]+ PRIVATE KEY-----
.env-style: (TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
JWT: eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
```
If a match is found, the skill **redacts the matched text before showing it to the user** using
the fixed token `[REDACTED:<pattern-label>:<match-length>-chars]` (pattern labels:
`aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`,
`anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`). File paths and line numbers are kept.
Raw match text is never echoed to terminal, chat log, or any persistent file.
The user answers `yes` / `no` / `redact`:
- `yes` — proceed; Runtime State records `last_scan_outcome_<kind>=user-approved-with-matches`.
- `redact` — the user supplies redactions, the skill applies them, and re-scans before sending. Runtime State records `last_scan_outcome_<kind>=redacted-and-approved`.
- `no` — stop the loop, set `Status: failed`, send Telegram summary.
## Supported Reviewer CLIs
| CLI | Round-1 command | Round-N resume | Output capture |
|---|---|---|---|
| `codex` | `codex exec -m <model> -s read-only -o <out.md> "<prompt>"` | `codex exec resume <session-id> -o <out.md> "<prompt>"` | `<out.md>` directly (helper `--success-file`) |
| `claude` | `claude -p "<prompt>" --model <model> --strict-mcp-config --setting-sources user` | Fresh call with prior-round context summary | `cp <runner.out> <out.md>` |
| `cursor` | `cursor-agent -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `cursor-agent --resume <id> -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `jq -r '.result' <out.json> > <out.md>` |
| `opencode` | `opencode run -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` | Fresh call (default) OR `opencode run -s <id> -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` (opt-in) | `jq -r '.[] \| select(.type == "message" and .role == "assistant") \| .content' <out.json> > <out.md>` |
| `pi` | See [PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md) | Fresh call | Markdown stdout copied to `<out.md>` |
For all supported reviewer CLIs, the preferred execution path is:
1. Write the reviewer command to a bash script.
2. Run that script through `reviewer-runtime/run-review.sh`.
3. Fall back to direct synchronous execution only if the helper is missing or not executable.
## Pi Reviewer Support
All workflow variants can use Pi itself as a reviewer CLI. Use `pi/<pi-model-name>` shorthand,
for example `pi/claude-opus-4-7`; this means `REVIEWER_CLI=pi` and
`REVIEWER_MODEL=claude-opus-4-7`. Provider-qualified or multi-slash Pi model IDs are preserved
after the first `pi/` prefix, for example `pi/anthropic/claude-opus-4-7`.
The canonical isolated read-only Pi reviewer flag contract lives in
[PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md). This workflow passes the plan and
implementation review payload at `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` and expects the
standard `## Summary`, `## Findings`, and `## Verdict` response. Pi reviewer output is captured
as markdown stdout, not JSON.
If the Pi reviewer model or provider is unavailable, surface the helper stderr/status and use
`pi --list-models [search]` to inspect configured models.
## Notifications
- Telegram is the only supported notification path.
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
- Notification failures are non-blocking, but they must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, the skill sends a Telegram summary first if configured.
- Terminal outcomes that trigger Telegram: `pushed`, `local-only`, `aborted-plan-review`,
`aborted-impl-review`, `aborted-verification`, `failed`.
The reviewer-runtime helper also supports manual override flags for diagnostics:
```bash
run-review.sh \
--command-file <path> \
--stdout-file <path> \
--stderr-file <path> \
--status-file <path> \
--poll-seconds 10 \
--soft-timeout-seconds 600 \
--stall-warning-seconds 300 \
--hard-timeout-seconds 1800
```
## Template Guardrails
All four `templates/task-plan.md` files share identical core sections (14 `##`-level headings)
and identical Status enum (10 values). Variant-specific guardrail language is permitted in the
leading blockquote and in the `Runtime` field of the Metadata table.
**Core sections** (appear in every variant, same order):
1. Metadata
2. Prompt
3. Interpretation
4. Assumptions
5. Files
6. Approach
7. TDD Approach
8. Acceptance Criteria
9. Verification
10. Rollback
11. Runtime State
12. Review History
13. Final Status
14. Guardrails (do NOT remove)
**Runtime State keys** (same across all variants): `plan_review_round`,
`implementation_review_round`, `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`,
`CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`,
`OPENCODE_IMPL_SESSION_ID`, `last_phase_entered`, `last_round_ts`, `last_scan_outcome_plan`,
`last_scan_outcome_impl`, `verification_attempts`, `tests_added_count`, `tdd_used`.
## Variant Hardening Notes
### Claude Code Hardening
- Must invoke explicit required sub-skills via the `Skill` tool:
- `superpowers:brainstorming`
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (conditional)
- Must enforce plan-mode file-write guard in Phase 4:
- If currently in plan mode, instruct user to exit plan mode before writing `task-plan.md`.
### Codex Hardening
- Must use native skill discovery from `~/.agents/skills/` (no CLI wrappers).
- Must verify Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- Must invoke required sub-skills with explicit announcements before any action.
- Must track checklist-driven sub-skills with `update_plan` todos (Codex equivalent of `TodoWrite`).
- `Task` subagents are unavailable — do the work directly and state the limitation.
- Deprecated CLI commands (`superpowers-codex bootstrap`, `use-skill`) must NOT be used.
- Helper paths: `~/.codex/skills/reviewer-runtime/...`.
- No plan-mode guard (Codex has no plan-mode concept).
### OpenCode Hardening
- Must use OpenCode's native skill tool (not Claude's `Skill` tool syntax). OpenCode may load
shared skill files from `~/.agents/skills/`, but invocation is still OpenCode-native.
- Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms
the required `superpowers/<skill>` set is discoverable before any other phase runs.
- Must verify Superpowers skill discovery under `~/.agents/skills/superpowers` or `~/.config/opencode/skills/superpowers`.
- Helper paths: `~/.config/opencode/skills/reviewer-runtime/...`.
- Opencode reviewer calls MUST use `--agent plan` (the built-in plan primary agent) for read-only posture.
- No plan-mode guard (OpenCode has no plan-mode concept).
### Cursor Hardening
- Must use Cursor-native discovery from `.cursor/skills/`, `~/.cursor/skills/`, or installed Cursor plugin cache entries.
- Must announce skill usage explicitly before invocation.
- `jq` is a hard prerequisite.
- Helper paths: `.cursor/skills/reviewer-runtime/...` preferred, `~/.cursor/skills/reviewer-runtime/...` fallback.
- Reviewer invocations MUST use `--mode=ask --trust --output-format json`. Never `--mode=agent`,
never `--force`, never write-capable modes for reviewer calls.
- No plan-mode guard (Cursor has no plan-mode concept).
## Execution Workflow Rules
- The skill works from `ai_plan/YYYY-MM-DD-<slug>/task-plan.md` as its single persistent artifact.
- Current branch is the default; worktree is opt-in only through explicit trigger phrases.
- Plan review completes before any implementation starts.
- Phase 7 verification gate must pass before the implementation review starts.
- The task commit is a single commit created in Phase 9.
- The `.gitignore` infra commit (Phase 1) is explicitly separate from the task commit and is
allowed even when the final task ends up `aborted` or `failed`.
- No push without explicit `yes` from the user.
- Secret scan runs per-payload with no caching.
- `MAX_ROUNDS=10` is shared across both loops (single mental model).