Compare commits

...

6 Commits

Author SHA1 Message Date
Stefano Fiorini c98f27f461 feat(do-task): add do-task skill with dual-loop review
M7 final — adds CLI version checks (claude --version, codex --version)
to the claude-code and codex Prerequisite Check sections so docs and
skill requirements agree. This addresses the only P3 finding from
the M7 review (non-blocking docs-to-skill mismatch).

All 7 milestones delivered across 6 local commits:
- 437b202 M1+M2: claude-code canonical + M1 canonical specs
- d69da3a M3: codex variant
- f404792 M4: cursor variant
- f5161f5 M5: opencode variant
- 9853d49 M6: docs/DO-TASK.md + README updates
- this commit (M7): final gate + P3 fix

do-task ships four runtime variants (claude-code, codex, cursor,
opencode) that execute a single user-supplied prompt end-to-end
with:
- Plan review loop and implementation review loop, each against a
  reviewer CLI (codex, claude, cursor, or opencode) with up to
  MAX_ROUNDS=10 revisions until APPROVED.
- Parameterized shared Review Loop subroutine (9 steps) invoked
  twice per run with loop-distinct session IDs so reviewer context
  never leaks across loops.
- Per-payload secret scan (10 anchored regexes; no caching) with
  redacted-only user surfacing (`[REDACTED:<label>:<n>-chars]`).
- TDD-first execution via superpowers:test-driven-development with
  strict auto-skip limited to pure-documentation and
  pure-comment-whitespace-rename.
- Verification gate (lint/typecheck/tests) before the
  implementation review loop.
- Single task commit after implementation APPROVED; explicit "yes"
  required to push.
- Telegram notification on every terminal outcome.
- Persistent ai_plan/YYYY-MM-DD-<slug>/task-plan.md with 10-value
  Status enum and resume semantics.

Reviewer: codex / gpt-5.4. Total review rounds across the 7
milestones: 3+1+1+1+2+1 = 9.

M7 smoke checks (all passing):
- YAML frontmatter: 4/4 parse.
- Core-section schema: 4/4 identical (14 sections, 10 Status enum
  values, 15 Runtime State keys).
- docs/DO-TASK.md links resolve.
- 4 variant Phase 1-10 dry-runs coherent; subroutine invoked from
  Phases 5 and 8 with distinct SESSION_ID_VAR per loop.
- Opencode reviewer branch passes `bash -n` on rendered Round-1,
  Round-N (fresh-call), and Round-N (resume) templates.
- Resume-state walk through Phase 4 reads all 6 session-id keys
  correctly.
- Trigger audit: "implement this" only as exclusion; dropped
  phrases only under "Dropped defaults".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:48:06 -05:00
Stefano Fiorini 9853d4937b docs(do-task): add DO-TASK.md + README updates (M6)
docs/DO-TASK.md covers:
- Purpose, Requirements (variant-specific prereqs + dependency-
  missing messages per variant), Reviewer CLI Requirements table
  (4 CLIs including opencode with fresh-call default).
- Install (4 subsections: Codex, Claude Code, OpenCode, Cursor).
- Per-variant Verify Installation subsections checking CLI binary,
  SKILL.md, run-review.sh, notify-telegram.sh, Superpowers
  sub-skills, and variant extras (Codex symlink, Cursor jq,
  OpenCode Superpowers ls, Cursor repo-vs-global lookup).
- Key Behavior, Dual Review Loops, Subroutine Steps, Reviewer Output
  Contract, Runtime Artifacts, Persistent Artifact Status enum
  (10 values), Failure Handling.
- Secret Scan (subroutine step 1a; per-payload; no caching) with
  canonical 10-pattern regex list and redaction contract.
- Supported Reviewer CLIs table (4 rows, including opencode).
- Notifications, Template Guardrails (14 core sections + Runtime
  State keys), Variant Hardening Notes (4 subsections), Execution
  Workflow Rules.

docs/README.md adds DO-TASK.md entry.

README.md:
- Skills table adds 4 do-task rows (codex, claude-code, opencode,
  cursor).
- Docs links add "Do-task guide" entry.
- Repository Layout adds do-task/ subdirectory.

Reviewer: codex / gpt-5.4. Approved round 2:
- Round 1: 2 P2 (prereqs inaccurate, Verify Installation incomplete)
  + 1 P3 -> REVISE.
- Round 2: 0 P0/P1/P2/P3 -> APPROVED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:40:49 -05:00
Stefano Fiorini f5161f584d feat(do-task): add opencode variant SKILL.md + template (M5)
Ports the claude-code canonical to OpenCode conventions:
- Phase 1 adds Bootstrap Superpowers Context step — uses OpenCode's
  native skill tool to list + verify superpowers/brainstorming,
  superpowers/test-driven-development,
  superpowers/verification-before-completion, and
  superpowers/finishing-a-development-branch before any other phase.
- All sub-skill invocations swapped to OpenCode native skill tool
  with superpowers/<skill> (slash) path format.
- Helper paths swapped to ~/.config/opencode/skills/reviewer-runtime/
  and ~/.config/opencode/skills/do-task/templates/.
- Plan-mode guard removed (OpenCode has no plan-mode concept).
- Prerequisite Check adds opencode --version + Superpowers symlink
  at ~/.config/opencode/skills/superpowers verification.
- Opencode reviewer branch (from M1 research) uses
  `opencode run -m <provider>/<model> --agent plan --format json`
  with fresh-call default and documented opt-in -s <id> resume path.
- Added Required Skill Invocation Rules, Variant Hardening Notes
  — OpenCode, Common Mistakes, and Red Flags sections.
- Template runtime field = opencode; guardrail line updated.

Core-section schema identical across all 4 variants (14 sections).

Reviewer: codex / gpt-5.4. Approved round 1 (0 P0/P1/P2/P3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:30:25 -05:00
Stefano Fiorini f404792927 feat(do-task): add cursor variant SKILL.md + template (M4)
Ports the claude-code canonical to Cursor Agent CLI conventions:
- Workspace skill discovery (.cursor/skills/ repo-local preferred,
  ~/.cursor/skills/ global fallback) replaces Skill-tool invocations.
- Helper path resolution prefers .cursor/skills/reviewer-runtime/
  over ~/.cursor/skills/reviewer-runtime/.
- jq added as a hard prerequisite; cursor-agent --version check added.
- Plan-mode guard removed (Cursor has no plan-mode concept).
- Reviewer invocations mandated to --mode=ask --trust
  --output-format json; explicit ban on --mode=agent and --force.
- Added Required Skill Invocation Rules, Variant Hardening Notes
  — Cursor, Common Mistakes, and Red Flags sections.
- Template runtime field = cursor; guardrail line updated.

Core-section schema identical to claude-code canonical.

Reviewer: codex / gpt-5.4. Approved round 1 (0 P0/P1/P2/P3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:24:53 -05:00
Stefano Fiorini d69da3a4a8 feat(do-task): add codex variant SKILL.md + template (M3)
Ports the claude-code canonical to Codex conventions:
- Native skill discovery from ~/.agents/skills/superpowers/<skill>/
  replaces Skill-tool invocations.
- update_plan todos replace Task subagents.
- Helper paths swapped to ~/.codex/skills/reviewer-runtime/.
- Plan-mode guard removed (Codex has no plan-mode concept).
- Prerequisite Check adds symlink verification
  (~/.agents/skills/superpowers -> ~/.codex/superpowers/skills).
- Added Required Skill Invocation Rules, Variant Hardening Notes
  — Codex, Common Mistakes, and Red Flags sections.
- Template runtime field = codex; guardrail line updated.

Core-section schema identical to claude-code canonical.
Frontmatter parses cleanly. Trigger-phrase audit clean.

Reviewer: codex / gpt-5.4. Approved round 1 (0 P0/P1/P2/P3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:19:13 -05:00
Stefano Fiorini 437b2024cd feat(do-task): add claude-code variant SKILL.md + template (M1+M2)
M1 canonical specs (opencode reviewer research, task-plan template,
Review Loop subroutine, Phase 1-10 prose, secret-scan regex list)
are embedded in the M2 claude-code SKILL.md as the canonical
reference; later variants (M3 codex, M4 cursor, M5 opencode) will
fork from this file.

Reviewer: codex / gpt-5.4. Approved after 3 rounds:
- Round 1: 2 P1 + 3 P2 → REVISE
- Round 2: 1 P2 → REVISE
- Round 3: 0 P0/P1/P2, 1 P3 (non-blocking) → APPROVED

Key design properties:
- Plan-review payload strips Runtime State and Review History to
  prevent reviewer session-ID leakage across rounds.
- Secret-scan step 1a redacts matched text to
  [REDACTED:<pattern-label>:<match-length>-chars] before any user
  surfacing; never echoes raw match content.
- Brainstorming required for any behavior-changing task; auto-skip
  limited to pure-documentation and pure-comment-whitespace-rename.
- Phase 3 reviewer config defaults to codex / gpt-5.4 with
  MAX_ROUNDS=10 when user opts for defaults; explicit interactive
  default also gpt-5.4 for internal consistency.
- Template Metadata records Branch Name and Worktree Path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:13:25 -05:00
11 changed files with 4139 additions and 0 deletions
+10
View File
@@ -34,6 +34,11 @@ ai-coding-skills/
│ │ ├── claude-code/
│ │ ├── opencode/
│ │ └── cursor/
│ ├── do-task/
│ │ ├── codex/
│ │ ├── claude-code/
│ │ ├── opencode/
│ │ └── cursor/
│ ├── implement-plan/
│ │ ├── codex/
│ │ ├── claude-code/
@@ -64,6 +69,10 @@ ai-coding-skills/
| create-plan | claude-code | Structured planning with milestones, iterative cross-model review, and runbook-first execution workflow | Ready | [CREATE-PLAN](docs/CREATE-PLAN.md) |
| create-plan | opencode | Structured planning with milestones, iterative cross-model review, and runbook-first execution workflow | Ready | [CREATE-PLAN](docs/CREATE-PLAN.md) |
| create-plan | cursor | Structured planning with milestones, iterative cross-model review, and runbook-first execution workflow | Ready | [CREATE-PLAN](docs/CREATE-PLAN.md) |
| do-task | codex | Single-prompt end-to-end execution with dual reviewer loops (plan + implementation), TDD-first, single task commit | Ready | [DO-TASK](docs/DO-TASK.md) |
| do-task | claude-code | Single-prompt end-to-end execution with dual reviewer loops (plan + implementation), TDD-first, single task commit | Ready | [DO-TASK](docs/DO-TASK.md) |
| do-task | opencode | Single-prompt end-to-end execution with dual reviewer loops (plan + implementation), TDD-first, single task commit | Ready | [DO-TASK](docs/DO-TASK.md) |
| do-task | cursor | Single-prompt end-to-end execution with dual reviewer loops (plan + implementation), TDD-first, single task commit | Ready | [DO-TASK](docs/DO-TASK.md) |
| implement-plan | codex | Worktree-isolated plan execution with iterative cross-model milestone review | Ready | [IMPLEMENT-PLAN](docs/IMPLEMENT-PLAN.md) |
| implement-plan | claude-code | Worktree-isolated plan execution with iterative cross-model milestone review | Ready | [IMPLEMENT-PLAN](docs/IMPLEMENT-PLAN.md) |
| implement-plan | opencode | Worktree-isolated plan execution with iterative cross-model milestone review | Ready | [IMPLEMENT-PLAN](docs/IMPLEMENT-PLAN.md) |
@@ -75,6 +84,7 @@ ai-coding-skills/
- Docs index: `docs/README.md`
- Atlassian guide: `docs/ATLASSIAN.md`
- Create-plan guide: `docs/CREATE-PLAN.md`
- Do-task guide: `docs/DO-TASK.md`
- Implement-plan guide: `docs/IMPLEMENT-PLAN.md`
- Web-automation guide: `docs/WEB-AUTOMATION.md`
+397
View File
@@ -0,0 +1,397 @@
# DO-TASK
## Purpose
Execute a single user-supplied prompt end-to-end with **two reviewer loops** (plan review + implementation review), with TDD-first execution, a pre-implementation verification gate, and a single task commit — all in one run of the skill. `do-task` is scoped to small-to-medium ad-hoc tasks; for multi-milestone work use `create-plan` + `implement-plan` instead.
`do-task` persists one plan artifact per run: `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`. The folder is kept as a record after success (not deleted). Resume is supported via the `Status` enum and Runtime State fields.
## Requirements
- Git repo with `/ai_plan/` entry in `.gitignore` (the skill adds the entry automatically if missing and commits it as a separate infra commit).
- Superpowers skills installed from: https://github.com/obra/superpowers
- Required dependencies (vary by variant; see Install below):
- `superpowers:brainstorming` (or `superpowers/brainstorming` for OpenCode)
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
- For Codex, native skill discovery must be configured:
- `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- For Cursor, skills must be installed under `.cursor/skills/` (repo-local) or `~/.cursor/skills/` (global), and `jq` is a hard prerequisite.
- For OpenCode, Superpowers must be installed at `~/.config/opencode/skills/superpowers`.
- Shared reviewer runtime (`run-review.sh`) AND Telegram notifier helper (`notify-telegram.sh`) must be installed beside agent skills. Both scripts ship under `skills/reviewer-runtime/` in this repo and must be copied into the per-variant location:
- Codex: `~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- Claude Code: `~/.claude/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
- Cursor: `.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}` (repo-local, preferred) or `~/.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}` (global fallback)
- Variant-specific prerequisites:
- **Claude Code:** `claude --version`, explicit `Skill`-tool invocation of sub-skills.
- **Codex:** `codex --version`; `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills` symlink present.
- **Cursor:** `cursor-agent --version`, `jq --version` (hard prereq), Superpowers installed under `.cursor/skills/` or `~/.cursor/skills/`.
- **OpenCode:** `opencode --version`; Superpowers installed at `~/.config/opencode/skills/superpowers`; Phase 1 runs Bootstrap Superpowers Context.
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
Dependency-missing messages are variant-specific:
- **Claude Code:** `Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
- **Codex:** `Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
- **Cursor:** `Missing dependency: [specific missing item]. Install Cursor Agent CLI, jq, and Superpowers skills under .cursor/skills/ or ~/.cursor/skills/, then retry.`
- **OpenCode:** `Missing dependency: [specific missing item]. Install required OpenCode Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the reviewer-runtime helper, then retry.`
### Reviewer CLI Requirements
One of these CLIs must be installed to drive either of the two review loops:
| Reviewer CLI | Install | Verify | Read-Only Mode | Session Resume |
|---|---|---|---|---|
| `codex` | `npm install -g @openai/codex` | `codex --version` | `-s read-only` | Yes (`codex exec resume <id>`) |
| `claude` | `npm install -g @anthropic-ai/claude-code` | `claude --version` | `--strict-mcp-config --setting-sources user` | No (fresh call each round) |
| `cursor` | `curl https://cursor.com/install -fsS \| bash` | `cursor-agent --version` (binary: `cursor-agent`; alias `cursor agent` also works) | `--mode=ask` | Yes (`--resume <id>`) |
| `opencode` | `brew install opencode` or your package manager | `opencode --version` | `--agent plan` | Opt-in (`-s <id>`; fresh call is the default) |
The reviewer CLI is independent of which agent is running the skill — e.g., Claude Code can send both the plan and the implementation to Codex for review.
**Additional dependency for `cursor` reviewer:** `jq` is required to parse Cursor's JSON output. Install via `brew install jq` (macOS) or your package manager. Verify: `jq --version`. The cursor variant of `do-task` makes `jq` a hard prerequisite regardless of which reviewer CLI is selected.
## Install
### Codex
```bash
mkdir -p ~/.codex/skills/do-task
cp -R skills/do-task/codex/* ~/.codex/skills/do-task/
mkdir -p ~/.codex/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.codex/skills/reviewer-runtime/
```
### Claude Code
```bash
mkdir -p ~/.claude/skills/do-task
cp -R skills/do-task/claude-code/* ~/.claude/skills/do-task/
mkdir -p ~/.claude/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.claude/skills/reviewer-runtime/
```
### OpenCode
```bash
mkdir -p ~/.config/opencode/skills/do-task
cp -R skills/do-task/opencode/* ~/.config/opencode/skills/do-task/
mkdir -p ~/.config/opencode/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.config/opencode/skills/reviewer-runtime/
```
### Cursor
Copy into the repo-local `.cursor/skills/` directory (where the Cursor Agent CLI discovers skills):
```bash
mkdir -p .cursor/skills/do-task
cp -R skills/do-task/cursor/* .cursor/skills/do-task/
mkdir -p .cursor/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* .cursor/skills/reviewer-runtime/
```
Or install globally (loaded via `~/.cursor/skills/`):
```bash
mkdir -p ~/.cursor/skills/do-task
cp -R skills/do-task/cursor/* ~/.cursor/skills/do-task/
mkdir -p ~/.cursor/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.cursor/skills/reviewer-runtime/
```
## Verify Installation
Run the per-variant checks for everything the corresponding `SKILL.md` enforces. Each check is structured: (1) CLI binary version, (2) skill file presence, (3) reviewer-runtime + notifier helper presence, (4) Superpowers sub-skill discovery, (5) variant-specific extras.
### Codex
```bash
codex --version
test -f ~/.codex/skills/do-task/SKILL.md
test -x ~/.codex/skills/reviewer-runtime/run-review.sh
test -x ~/.codex/skills/reviewer-runtime/notify-telegram.sh
test -L ~/.agents/skills/superpowers
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### Claude Code
```bash
claude --version
test -f ~/.claude/skills/do-task/SKILL.md
test -x ~/.claude/skills/reviewer-runtime/run-review.sh
test -x ~/.claude/skills/reviewer-runtime/notify-telegram.sh
test -f ~/.claude/skills/superpowers/brainstorming/SKILL.md
test -f ~/.claude/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.claude/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.claude/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### OpenCode
```bash
opencode --version
test -f ~/.config/opencode/skills/do-task/SKILL.md
test -x ~/.config/opencode/skills/reviewer-runtime/run-review.sh
test -x ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
ls -l ~/.config/opencode/skills/superpowers
test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md
```
### Cursor
```bash
cursor-agent --version
jq --version
test -f .cursor/skills/do-task/SKILL.md || test -f ~/.cursor/skills/do-task/SKILL.md
test -x .cursor/skills/reviewer-runtime/run-review.sh || test -x ~/.cursor/skills/reviewer-runtime/run-review.sh
test -x .cursor/skills/reviewer-runtime/notify-telegram.sh || test -x ~/.cursor/skills/reviewer-runtime/notify-telegram.sh
test -f .cursor/skills/superpowers/skills/brainstorming/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/brainstorming/SKILL.md
test -f .cursor/skills/superpowers/skills/test-driven-development/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/test-driven-development/SKILL.md
test -f .cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/verification-before-completion/SKILL.md
test -f .cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md
```
## Key Behavior
- Creates one persistent plan artifact at `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`.
- Ensures `/ai_plan/` is in `.gitignore`. If missing, adds it and creates a separate `chore(gitignore): ignore ai_plan local planning artifacts` commit.
- Parses the user prompt, detects the trigger phrase, and asks 1-3 clarifying questions unless the prompt already has a concrete target + outcome + unambiguous scope + resolvable identifiers.
- Invokes `superpowers:brainstorming` for any behavior-changing task (feature creation, non-trivial bug fix, refactor, design decision). The only skip conditions are `pure-documentation` and `pure-comment-whitespace-rename`.
- Asks which reviewer CLI, model, and max rounds to use (or accepts `skip` for no review). "Use defaults" maps to `codex / gpt-5.4 / MAX_ROUNDS=10`.
- Runs the plan review loop (Phase 5) before implementation, iterating up to `MAX_ROUNDS` (default 10) or until the reviewer returns `VERDICT: APPROVED`.
- Executes with TDD-first (Phase 6) via `superpowers:test-driven-development`. Auto-skip permitted only for `pure-documentation` and `pure-comment-whitespace-rename`; all other skips (including config-file additions) require explicit user approval, recorded in the TDD Approach section with an ISO-8601 timestamp.
- Runs lint/typecheck/tests as a **verification gate** (Phase 7) before the implementation review loop.
- Runs the implementation review loop (Phase 8) against the diff + verification output, iterating up to `MAX_ROUNDS` or until `APPROVED`.
- Scans every outbound reviewer payload for secrets (subroutine step 1a). Per-payload, no caching.
- Creates a **single commit** after the implementation review approves. Does NOT push. Asks the user for explicit `yes` before any push.
- Defaults to the **current branch**. Worktree only on explicit opt-in (`"in a worktree"`, `"use a worktree"`, `"on an isolated branch"`, `"on a new branch called X"`).
- Supports resume: detects existing folder by slug and uses `Status` + Runtime State to decide how to re-enter.
- Sends completion notifications through Telegram only when the shared setup in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
## Dual Review Loops
`do-task` runs the reviewer twice per successful run, with separate session IDs so reviewer context never leaks across loops.
1. **Plan review loop (Phase 5)** — payload is the current `task-plan.md` with `Runtime State` and `Review History` stripped. The reviewer evaluates whether the plan matches the prompt, whether assumptions are surfaced, whether acceptance criteria are testable, whether the TDD approach is appropriate, and whether there are missing files/risks/security concerns.
2. **Implementation review loop (Phase 8)** — payload is the approved task plan (without Runtime State) + `git diff` (unstaged + staged) + verification output (lint, typecheck, tests). The reviewer evaluates correctness, code quality, test coverage, security, and regression risk.
Both loops share the same 9-step subroutine and the same `MAX_ROUNDS` counter (default 10).
### Subroutine Steps (inside each review loop)
1. Write payload to `/tmp/do-task-<kind>-<REVIEW_ID>.md`.
2. **Secret scan (step 1a)** — per-payload, no caching. See Secret Scan section below.
3. Generate reviewer command script at `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh`.
4. Run via `reviewer-runtime/run-review.sh`.
5. Promote reviewer output and capture the session ID on Round 1; persist it to `task-plan.md` Runtime State under the loop-specific variable (`CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, or `OPENCODE_IMPL_SESSION_ID`).
6. Parse verdict; append an entry to Review History; bump the round counter.
7. Branch: `APPROVED` → exit, `REVISE` → caller revises and re-enters, `MAX_ROUNDS` → caller decides.
8. Liveness contract: wait while `In progress N` heartbeats arrive from the runner.
9. Cleanup temp artifacts on success.
### Reviewer Output Contract
- `P0` = total blocker
- `P1` = major risk
- `P2` = must-fix before approval
- `P3` = cosmetic / nice to have
- Each severity section uses `- None.` when empty.
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain.
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe.
## Runtime Artifacts
Per review loop (`<kind>` = `plan` or `implementation`):
- `/tmp/do-task-<kind>-<REVIEW_ID>.md` — payload
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.md` — normalized review text
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.json` — raw JSON (cursor always; opencode with `--format json`)
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.stderr` — reviewer stderr
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.status` — helper heartbeat/status log
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.runner.out` — helper-managed stdout
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh` — reviewer command script
Status log lines use this format:
```text
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
```
`in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should stop waiting if `in-progress` heartbeats continue.
### Persistent Artifact
The one file kept across runs is `ai_plan/<slug>/task-plan.md`. Its `Status` enum drives resume decisions:
| Status | Meaning |
|---|---|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
## Failure Handling
- `completed-empty-output` — the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause.
- `needs-operator-decision` — the helper reached hard-timeout escalation; surface `.status` and decide whether to extend the timeout, abort, or retry with different parameters.
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
- Verification gate (Phase 7) retries up to 3 times. On exhaustion, `Status` becomes `aborted-verification` and the user is asked whether to retry, override, or abort.
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller keeps waiting.
## Secret Scan (subroutine step 1a; per-payload; no caching)
Every outbound reviewer payload is scanned **before** being sent to the reviewer CLI. This scan runs on every round of both loops. No results are cached, because the Phase 8 payload includes newly-introduced diff content that earlier rounds never saw.
Canonical anchored regex list (10 patterns):
```
AWS access key: AKIA[0-9A-Z]{16}
GCP service-acct: "type"\s*:\s*"service_account"
GitHub tokens: (ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
Slack tokens: xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
OpenAI API keys: sk-(proj-)?[A-Za-z0-9_-]{20,}
Anthropic API keys: sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
PEM private keys: -----BEGIN [A-Z ]+ PRIVATE KEY-----
.env-style: (TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
JWT: eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
```
If a match is found, the skill **redacts the matched text before showing it to the user** using the fixed token `[REDACTED:<pattern-label>:<match-length>-chars]` (pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`). File paths and line numbers are kept. Raw match text is never echoed to terminal, chat log, or any persistent file.
The user answers `yes` / `no` / `redact`:
- `yes` — proceed; Runtime State records `last_scan_outcome_<kind>=user-approved-with-matches`.
- `redact` — the user supplies redactions, the skill applies them, and re-scans before sending. Runtime State records `last_scan_outcome_<kind>=redacted-and-approved`.
- `no` — stop the loop, set `Status: failed`, send Telegram summary.
## Supported Reviewer CLIs
| CLI | Round-1 command | Round-N resume | Output capture |
|---|---|---|---|
| `codex` | `codex exec -m <model> -s read-only -o <out.md> "<prompt>"` | `codex exec resume <session-id> -o <out.md> "<prompt>"` | `<out.md>` directly (helper `--success-file`) |
| `claude` | `claude -p "<prompt>" --model <model> --strict-mcp-config --setting-sources user` | Fresh call with prior-round context summary | `cp <runner.out> <out.md>` |
| `cursor` | `cursor-agent -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `cursor-agent --resume <id> -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `jq -r '.result' <out.json> > <out.md>` |
| `opencode` | `opencode run -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` | Fresh call (default) OR `opencode run -s <id> -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` (opt-in) | `jq -r '.[] \| select(.type == "message" and .role == "assistant") \| .content' <out.json> > <out.md>` |
For all four CLIs, the preferred execution path is:
1. Write the reviewer command to a bash script.
2. Run that script through `reviewer-runtime/run-review.sh`.
3. Fall back to direct synchronous execution only if the helper is missing or not executable.
## Notifications
- Telegram is the only supported notification path.
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
- Notification failures are non-blocking, but they must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, the skill sends a Telegram summary first if configured.
- Terminal outcomes that trigger Telegram: `pushed`, `local-only`, `aborted-plan-review`, `aborted-impl-review`, `aborted-verification`, `failed`.
The reviewer-runtime helper also supports manual override flags for diagnostics:
```bash
run-review.sh \
--command-file <path> \
--stdout-file <path> \
--stderr-file <path> \
--status-file <path> \
--poll-seconds 10 \
--soft-timeout-seconds 600 \
--stall-warning-seconds 300 \
--hard-timeout-seconds 1800
```
## Template Guardrails
All four `templates/task-plan.md` files share identical core sections (14 `## `-level headings) and identical Status enum (10 values). Variant-specific guardrail language is permitted in the leading blockquote and in the `Runtime` field of the Metadata table.
**Core sections** (appear in every variant, same order):
1. Metadata
2. Prompt
3. Interpretation
4. Assumptions
5. Files
6. Approach
7. TDD Approach
8. Acceptance Criteria
9. Verification
10. Rollback
11. Runtime State
12. Review History
13. Final Status
14. Guardrails (do NOT remove)
**Runtime State keys** (same across all variants): `plan_review_round`, `implementation_review_round`, `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, `last_phase_entered`, `last_round_ts`, `last_scan_outcome_plan`, `last_scan_outcome_impl`, `verification_attempts`, `tests_added_count`, `tdd_used`.
## Variant Hardening Notes
### Claude Code
- Must invoke explicit required sub-skills via the `Skill` tool:
- `superpowers:brainstorming`
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (conditional)
- Must enforce plan-mode file-write guard in Phase 4:
- If currently in plan mode, instruct user to exit plan mode before writing `task-plan.md`.
### Codex
- Must use native skill discovery from `~/.agents/skills/` (no CLI wrappers).
- Must verify Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- Must invoke required sub-skills with explicit announcements before any action.
- Must track checklist-driven sub-skills with `update_plan` todos (Codex equivalent of `TodoWrite`).
- `Task` subagents are unavailable — do the work directly and state the limitation.
- Deprecated CLI commands (`superpowers-codex bootstrap`, `use-skill`) must NOT be used.
- Helper paths: `~/.codex/skills/reviewer-runtime/...`.
- No plan-mode guard (Codex has no plan-mode concept).
### OpenCode
- Must use OpenCode's native skill tool (not Claude's `Skill` tool syntax, not Codex's `~/.agents/skills/` paths).
- Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms the required `superpowers/<skill>` set is discoverable before any other phase runs.
- Must verify Superpowers skill discovery under `~/.config/opencode/skills/superpowers`.
- Helper paths: `~/.config/opencode/skills/reviewer-runtime/...`.
- Opencode reviewer calls MUST use `--agent plan` (the built-in plan primary agent) for read-only posture.
- No plan-mode guard (OpenCode has no plan-mode concept).
### Cursor
- Must use workspace discovery from `.cursor/skills/` (repo-local) or `~/.cursor/skills/` (global).
- Must announce skill usage explicitly before invocation.
- `jq` is a hard prerequisite.
- Helper paths: `.cursor/skills/reviewer-runtime/...` preferred, `~/.cursor/skills/reviewer-runtime/...` fallback.
- Reviewer invocations MUST use `--mode=ask --trust --output-format json`. Never `--mode=agent`, never `--force`, never write-capable modes for reviewer calls.
- No plan-mode guard (Cursor has no plan-mode concept).
## Execution Workflow Rules
- The skill works from `ai_plan/YYYY-MM-DD-<slug>/task-plan.md` as its single persistent artifact.
- Current branch is the default; worktree is opt-in only through explicit trigger phrases.
- Plan review completes before any implementation starts.
- Phase 7 verification gate must pass before the implementation review starts.
- The task commit is a single commit created in Phase 9.
- The `.gitignore` infra commit (Phase 1) is explicitly separate from the task commit and is allowed even when the final task ends up `aborted` or `failed`.
- No push without explicit `yes` from the user.
- Secret scan runs per-payload with no caching.
- `MAX_ROUNDS=10` is shared across both loops (single mental model).
+1
View File
@@ -6,6 +6,7 @@ This directory contains user-facing docs for each skill.
- [ATLASSIAN.md](./ATLASSIAN.md) — Includes requirements, generated bundle sync, install, auth, safety rules, and usage examples for the Atlassian skill.
- [CREATE-PLAN.md](./CREATE-PLAN.md) — Includes requirements, install, verification, and execution workflow for create-plan.
- [DO-TASK.md](./DO-TASK.md) — Single-prompt end-to-end execution with dual reviewer loops (plan + implementation), TDD-first, single task commit. Sibling of create-plan/implement-plan scoped to ad-hoc tasks.
- [IMPLEMENT-PLAN.md](./IMPLEMENT-PLAN.md) — Includes requirements, install, verification, and milestone review workflow for implement-plan.
- [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) — Shared Telegram notification setup used by reviewer-driven skills.
- [WEB-AUTOMATION.md](./WEB-AUTOMATION.md) — Includes requirements, install, dependency verification, and usage examples for web-automation.
+749
View File
@@ -0,0 +1,749 @@
---
name: do-task
description: Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review). ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan).
---
# Do Task (Claude Code)
Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify.
This is a single-artifact sibling of `create-plan` + `implement-plan`. Unlike `implement-plan`, `do-task` operates on one persistent `task-plan.md` (not a full milestone plan) and defaults to the **current branch** (not a worktree).
## Prerequisite Check (MANDATORY)
Required:
- Claude Code CLI: `claude --version`
- Superpowers repo: `https://github.com/obra/superpowers`
- `superpowers:brainstorming`
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
- Shared reviewer runtime: `~/.claude/skills/reviewer-runtime/run-review.sh`
- Telegram notifier helper: `~/.claude/skills/reviewer-runtime/notify-telegram.sh`
If any required dependency is missing, stop immediately and return:
`Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
This variant depends on explicit sub-skill invocation via the `Skill` tool. Do NOT use shell wrappers.
## Trigger Phrase Detection
**Binding triggers** (always invoke this skill):
- `/do-task`
- "do this task"
- "do task ..."
- "execute this task"
- "make it so"
**Hint trigger** (invoke unless context clearly maps to another skill):
- "just do ..."
**Escape phrases** (skip the Phase 2 clarifying-question loop):
- `--no-questions`
- `"just do it:"`
- `"just do this:"`
- `"no questions:"`
**Excluded** (do NOT trigger `do-task`):
- "implement this" — reserved for `implement-plan`.
**Dropped defaults** (explicitly NOT binding triggers):
- "work on ..."
- "handle this"
- "take care of ..."
- "get this done"
**Worktree opt-in phrases** (Phase 4 takes the worktree branch):
- "in a worktree"
- "use a worktree"
- "on an isolated branch"
- "on a new branch called X"
## Process
### Phase 1: Preflight
1. Verify git repo: `git rev-parse --is-inside-work-tree`.
2. Verify `/ai_plan/` is present in `.gitignore`. If missing:
- Append `/ai_plan/` to `.gitignore`.
- Commit that infra change immediately with message `chore(gitignore): ignore ai_plan local planning artifacts`.
- This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up `aborted` or `failed`.
3. Verify required sub-skills are discoverable through the `Skill` tool. Announce each sub-skill before invocation using:
`I've read the [Skill Name] skill and I'm using it to [purpose].`
### Phase 2: Parse Prompt and Question
1. Capture the exact user prompt verbatim.
2. Detect trigger phrase (see above) and record which one matched.
3. Detect escape phrase. If set, skip clarifying questions entirely.
4. Apply the ask-first heuristic:
- Skip clarifying questions ONLY if ALL are true:
- Prompt names a concrete target (file, feature, or function).
- Prompt names a concrete outcome (what success looks like).
- Prompt has no ambiguous scope (no "and maybe also ...").
- All identifiers in the prompt are resolvable against the codebase.
- Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred.
- Empty prompt → ask exactly once: "what task?".
5. Invoke `superpowers:brainstorming` via the `Skill` tool for any **behavior-changing** task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: `pure-documentation` and `pure-comment-whitespace-rename`. When skipping, record the skip reason in the Interpretation section of `task-plan.md`.
### Phase 3: Configure Reviewer
If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with `REVIEWER_CLI=codex`, `REVIEWER_MODEL=gpt-5.4`, and `MAX_ROUNDS=10`. Otherwise, ask:
1. **Which CLI should review both the plan and the implementation?**
- `codex` — OpenAI Codex CLI (`codex exec`)
- `claude` — Claude Code CLI (`claude -p`)
- `cursor` — Cursor Agent CLI (`cursor-agent -p`)
- `opencode` — OpenCode CLI (`opencode run`)
- `skip` — No external review, proceed with user approval only at each loop.
2. **Which model?** (only if a CLI was chosen)
- For `codex`: default `gpt-5.4`, alternatives: `gpt-5.3-codex`, `o4-mini`, `o3`.
- For `claude`: default `sonnet`, alternatives: `opus`, `haiku`.
- For `cursor`: **run `cursor-agent models` first** to see available models.
- For `opencode`: provider-qualified form `<provider>/<model>` (e.g., `anthropic/claude-sonnet-4-5`, `openai/gpt-5.4`). Run `opencode models` to list available models.
- Accept any model string the user provides.
3. **Max review rounds shared across both loops?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phases 5 and 8.
### Phase 4: Initialize Plan Workspace
**PLAN MODE CHECK:** If currently in plan mode:
1. Inform user that `task-plan.md` cannot be written while in plan mode.
2. Instruct user to exit plan mode (approve plan or use `ExitPlanMode`).
3. Proceed with file generation only after exiting plan mode.
Steps:
1. Compute slug: `YYYY-MM-DD-<slug>` where `<slug>` is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only).
2. Compute plan folder: `ai_plan/<slug>/`.
3. **Resume detection:** If the folder already exists, read `task-plan.md`:
- If `Status` is `draft` or `plan-approved` or `implementation-in-progress`: offer to resume, pick a new suffix (`<slug>-v2`), or abort. Default is resume.
- If `Status` is any terminal value (`pushed`, `local-only`, `aborted-*`, `failed`): offer a new suffix or abort. Default is new suffix.
4. If not resuming, create the folder and write `task-plan.md` from the template at `templates/task-plan.md` (this skill's template folder; falls back to `~/.claude/skills/do-task/templates/task-plan.md` when installed directly).
5. Fill in:
- `Metadata` block.
- `Prompt` (verbatim).
- `Interpretation`, `Assumptions`, `Files`, `Approach`, `TDD Approach`, `Acceptance Criteria`, `Verification`, `Rollback`.
- Leave `Runtime State`, `Review History`, `Final Status` empty (skill updates these).
6. Set `Status: draft`.
**Worktree branch:** If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke `superpowers:using-git-worktrees` via the `Skill` tool before proceeding. Otherwise continue on the current branch.
### Phase 5: Plan Review Loop
If `REVIEWER_CLI=skip`, present `task-plan.md` to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = plan
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
PAYLOAD_PATH = /tmp/do-task-plan-${REVIEW_ID}.md
PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID
```
Payload is the current `task-plan.md` **with the `Runtime State` and `Review History` blocks stripped** before writing to `PAYLOAD_PATH`. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections.
`PLAN_REVIEW_PROMPT`:
```text
Review this task plan for completeness, correctness, and risk. Focus on:
1. Does the plan match the user's prompt?
2. Are all assumptions surfaced?
3. Are acceptance criteria testable?
4. Is the TDD approach appropriate per the TDD Approach section?
5. Are there missing files, risks, or security concerns?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: plan-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 6.
On MAX_ROUNDS:
- Set `Status: aborted-plan-review`.
- Send Telegram summary before stopping.
- Ask the user whether to override and proceed, restart, or abort.
### Phase 6: Execute (TDD-first where applicable)
Native orchestration — do not invoke `superpowers:executing-plans`.
1. Set `Status: implementation-in-progress`.
2. For every behavior-changing file edit:
- Invoke `superpowers:test-driven-development` via the `Skill` tool.
- Write the failing test first. Run it. Confirm it fails.
- Implement the minimal code to make it pass. Run the test. Confirm green.
- Do NOT commit yet — a single task commit happens in Phase 9.
3. Auto-skip of TDD is permitted ONLY for tasks classified in `task-plan.md` TDD Approach as:
- `pure-documentation`
- `pure-comment-whitespace-rename`
4. Any other skip (including `pure-config-addition`) requires explicit user approval recorded in `task-plan.md` with an ISO-8601 timestamp.
5. Update `task-plan.md` after each logical step: add notes to `Approach`, check off `Acceptance Criteria` items as they complete.
### Phase 7: Verification Gate
Invoke `superpowers:verification-before-completion` via the `Skill` tool.
Run the commands listed in the `Verification` section of `task-plan.md`:
- Lint (changed files first).
- Typecheck.
- Tests (targeted first, then broader suite if quick).
All must pass. If a command fails:
- Fix the issue.
- Re-run that command.
- Increment `verification_attempts` in Runtime State.
If `verification_attempts` exceeds 3 without green:
- Set `Status: aborted-verification`.
- Send Telegram summary.
- Ask the user whether to retry, override, or abort.
### Phase 8: Implementation Review Loop
If `REVIEWER_CLI=skip`, present a diff + verification summary to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = implementation
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) # distinct from plan-review ID
PAYLOAD_PATH = /tmp/do-task-implementation-${REVIEW_ID}.md
PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID
```
Payload contents (assembled by the skill):
```markdown
# Implementation Review: [Short Title]
## Task Plan (the plan that was approved)
<embed approved task-plan.md, excluding Runtime State block>
## Changes Made (git diff)
<output of: `git diff` for unstaged + `git diff --staged` for staged>
## Verification Output
### Lint
<lint output>
### Typecheck
<typecheck output>
### Tests
<test output, pass/fail counts>
```
`IMPL_REVIEW_PROMPT`:
```text
Review this implementation against the task plan. Focus on:
1. Correctness — Does the diff satisfy the Acceptance Criteria?
2. Code quality — Clean, maintainable, no obvious issues?
3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)?
4. Security — Any security concerns introduced?
5. Regressions — Does the diff risk breaking unrelated code?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: implementation-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 9.
On MAX_ROUNDS:
- Set `Status: aborted-impl-review`.
- Send Telegram summary.
- Ask the user whether to override and commit anyway, restart, or abort.
### Phase 9: Commit + Push Ask
Invoke `superpowers:finishing-a-development-branch` via the `Skill` tool.
1. Stage all changed files explicitly (avoid `git add -A`).
2. Single commit with message derived from the task goal:
- Format: `<type>(<scope>): <short description>`
- Example: `feat(auth): add session token rotation`
3. Do NOT push. Update `Status: local-only`.
4. Ask the user: "Push to remote? (yes / no)"
- On explicit `yes` → push, then set `Status: pushed`.
- Any other response → leave `Status: local-only`.
### Phase 10: Telegram Notification + Finalize
Resolve the notifier helper:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome (`pushed`, `local-only`, `aborted-*`, `failed`), send a Telegram summary if both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are set:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "do-task <slug>: <status summary>"
fi
```
Rules:
- Telegram is the only supported notification path.
- Notification failures are non-blocking but must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
- If Telegram is not configured, state that no Telegram notification was sent.
Fill in `Final Status` in `task-plan.md` (include commit hash if any). Do NOT delete the plan folder — it stays as a record.
---
## Review Loop (Shared Subroutine)
This subroutine is invoked twice per `do-task` run: once in Phase 5 (`REVIEW_KIND=plan`) and once in Phase 8 (`REVIEW_KIND=implementation`). Separate session IDs are used for each loop so reviewer context never leaks across loops.
### Subroutine Inputs
| Variable | Purpose |
|----------|---------|
| `REVIEW_KIND` | `plan` or `implementation` |
| `REVIEW_ID` | 8-char hex (from `uuidgen`); reused across rounds of the same loop |
| `PAYLOAD_PATH` | `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` |
| `PROMPT_TEMPLATE` | `PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT` |
| `REVIEWER_CLI` | `codex` \| `claude` \| `cursor` \| `opencode` |
| `REVIEWER_MODEL` | Model name |
| `MAX_ROUNDS` | Default 10 |
| `SESSION_ID_VAR` | `CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID` |
Temp artifact paths (per loop):
- `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` — payload
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md` — normalized review text
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json` — raw Cursor/OpenCode JSON (cursor only, plus opencode when `--format json` is used)
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh`
Resolve the shared helper:
```bash
REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
;;
esac
```
### Step 1: Write Payload
Write the full payload for this round to `PAYLOAD_PATH`.
### Step 1a: Secret Scan (per-payload, no caching)
**BEFORE** sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw.
Run the secret scan with all of these anchored regexes. Use `grep -En` on the payload file:
```bash
SECRET_REGEX_FILE=$(mktemp)
cat >"$SECRET_REGEX_FILE" <<'EOF'
AKIA[0-9A-Z]{16}
"type"\s*:\s*"service_account"
(ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
sk-(proj-)?[A-Za-z0-9_-]{20,}
sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
-----BEGIN [A-Z ]+ PRIVATE KEY-----
(TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
EOF
SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true)
rm -f "$SECRET_REGEX_FILE"
```
If `SCAN_MATCHES` is non-empty:
1. **Redact the matched text before surfacing** — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: `[REDACTED:<pattern-label>:<match-length>-chars]`. Example: a matched AWS key becomes `[REDACTED:aws-access-key:20-chars]`. Keep the file path and line number; they are useful for the user and not secret.
2. Present the redacted match summary to the user using this exact wording:
```
SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N):
<file>:<line>: [REDACTED:<pattern-label>:<match-length>-chars]
...
Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact)
```
Pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`.
2. Wait for user response.
3. On `yes`: record `last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches` in Runtime State, and proceed.
4. On `redact`: ask the user to supply redactions, apply them to `PAYLOAD_PATH`, re-scan (this step), record `last_scan_outcome_${REVIEW_KIND}=redacted-and-approved`.
5. On `no`: stop the loop, set `Status: failed`, send Telegram, return to the user.
If `SCAN_MATCHES` is empty, record `last_scan_outcome_${REVIEW_KIND}=clean` and proceed.
### Step 2: Generate Reviewer Command Script
Write the reviewer invocation to `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` as a bash script starting with:
```bash
#!/usr/bin/env bash
set -euo pipefail
```
**If `REVIEWER_CLI` is `codex`:**
Round 1 — fresh `codex exec`:
```bash
codex exec \
-m ${REVIEWER_MODEL} \
-s read-only \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
${PROMPT_TEMPLATE}"
```
Do not capture the Codex session ID yet. After Round 1 completes, extract it from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` (look for `session id: <uuid>`) and persist it to Runtime State under `${SESSION_ID_VAR}`.
Round 2 and later — resume session:
```bash
codex exec resume ${SESSION_ID_VAR_VALUE} \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before.
Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain."
```
If resume fails, fall back to fresh `codex exec` with prior-round context.
**If `REVIEWER_CLI` is `claude`:**
Fresh call every round (Claude CLI has no session resume):
```bash
claude -p \
"${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
${PROMPT_TEMPLATE}" \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
```
Where `${ROUND_PREFIX}` is empty for Round 1 and `"You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. "` for subsequent rounds.
**If `REVIEWER_CLI` is `cursor`:**
Round 1:
```bash
cursor-agent -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later — resume:
```bash
cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
If resume fails, fall back to fresh `cursor-agent -p`.
After the command completes, extract the session id and review text:
```bash
CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
Persist `CURSOR_SID` to Runtime State under `${SESSION_ID_VAR}` on Round 1.
**If `REVIEWER_CLI` is `opencode`:**
OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in `plan` primary agent (`--agent plan`) for review, which is read-oriented and does not modify files. Session resume is supported via `-s <session-id>`, but the most reliable pattern for non-interactive review is **fresh call each round** (like `claude`) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in `--format json` output.
Round 1 (preferred, fresh call):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later (fresh-call fallback path — recommended default):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"You previously reviewed this ${REVIEW_KIND} and requested revisions.
Previous feedback summary: [key points from last review]
I've revised. Updated payload is below.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Optional session-resume path (only if the installed opencode reliably emits a session id in `--format json` output and accepts it back via `-s`):
```bash
# Round 2+ with resume
opencode run \
-s ${SESSION_ID_VAR_VALUE} \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Extract the review body (the JSON stream emits events; the final assistant message contains the review text):
```bash
jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
|| cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as `completed-empty-output` and follow the helper-failure escalation in Step 6.
### Step 3: Run via `run-review.sh`
Run the command script through the shared helper when available:
```bash
if [ -x "$REVIEWER_RUNTIME" ]; then
"$REVIEWER_RUNTIME" \
--command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
--stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
--status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
fi
```
Run the helper in the foreground and watch live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the `.status` file instead of treating heartbeats as post-hoc-only data.
### Step 4: Promote Reviewer Output + Capture Session ID
After the command completes:
- `cursor`: already promoted in Step 2 via `jq -r '.result' ...`. Also capture `session_id` if first round.
- `codex`: extract `CODEX_SESSION_ID` from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text lives only in `.runner.out`, `cp` it into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `claude`: promote `.runner.out` into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `opencode`: already promoted in Step 2 via `jq` on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to `${SESSION_ID_VAR}`.
On Round 1, persist the captured session ID (if any) into `task-plan.md`'s Runtime State under `${SESSION_ID_VAR}`.
### Step 5: Parse Verdict + Update Review History
1. Read `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md`.
2. Append one row to `task-plan.md` Review History:
- Timestamp (ISO-8601 UTC).
- Loop (`plan` or `implementation`).
- Round number.
- Verdict (`APPROVED` or `REVISE`).
- Summary (first line of the `## Summary` section).
3. Increment `plan_review_round` or `implementation_review_round` in Runtime State.
### Step 6: Branch APPROVED / REVISE / MAX_ROUNDS
Verdict rules:
- **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → exit the subroutine with `APPROVED`.
- **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if cheap and safe, then exit with `APPROVED`.
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to revision (see below), then return to Step 1 for the next round.
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as APPROVED.
- Helper state `completed-empty-output` → treat as failed review attempt, surface `.stderr`/`.status`, fix invocation or prompt handling, then retry.
- Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters.
- Round counter ≥ `MAX_ROUNDS` → exit the subroutine with `MAX_ROUNDS`. Caller decides next action per Phase 5 or Phase 8.
**Revision:** The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (`P0` → `P1` → `P2` → `P3`). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1.
### Step 7: Liveness Contract (during Step 3)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- Keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
### Step 8: Cleanup (on successful round exit)
```bash
rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh
```
If the round failed, produced empty output, or reached operator-decision timeout, KEEP `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them.
---
## Resume Semantics
1. Detect existing plan folder by slug at Phase 4.
2. Read `task-plan.md` → `Status`.
3. Decide next action:
| Status | Action |
|--------|--------|
| `draft` | Resume at Phase 5 (plan review) |
| `plan-approved` | Resume at Phase 6 (execute) |
| `implementation-in-progress` | Resume at Phase 6 (continue execute) |
| `implementation-approved` | Resume at Phase 9 (commit + push ask) |
| `pushed` \| `local-only` | Ask user: new suffix, abort, or replay for reference only |
| `aborted-*` \| `failed` | Offer new suffix or full restart |
4. When resuming, read Runtime State for `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via `codex exec resume`, `cursor-agent --resume`, or `opencode run -s <id>` as applicable.
---
## Tracker Discipline (MANDATORY)
**ALWAYS update `task-plan.md` before/after each phase transition. NEVER proceed with stale state.**
Before starting any phase:
1. Update `Status` if it transitions.
2. Update `last_phase_entered` in Runtime State.
After completing any phase:
1. Update `Status` if it transitions.
2. Append notes to the relevant section of `task-plan.md`.
Review History is append-only.
---
## Execution Workflow Rules
- Current branch is the default; worktree is opt-in only.
- Do NOT push without explicit "yes".
- Secret scan runs **per-payload, no caching** — every round, including revisions.
- Review loops use `MAX_ROUNDS=10` by default, shared across both loops.
- The task commit is a single commit created in Phase 9; interim WIP commits are NOT created.
- The `.gitignore` infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort.
---
## Verification Checklist
- [ ] `ai_plan/` exists and `/ai_plan/` is in `.gitignore`
- [ ] `task-plan.md` created under `ai_plan/YYYY-MM-DD-<slug>/`
- [ ] Reviewer CLI + model + `MAX_ROUNDS` configured (or `skip`)
- [ ] Secret scan ran on every outbound reviewer payload
- [ ] Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Phase 6 executed TDD-first for all behavior-changing steps (or documented skip)
- [ ] Phase 7 verification green before Phase 8
- [ ] Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Single task commit created locally, no push without explicit yes
- [ ] Telegram notification attempted if configured
- [ ] `task-plan.md` Final Status filled in
@@ -0,0 +1,144 @@
# Task Plan: [Short Title]
> **Variant guardrail (Claude Code):** When generating or updating this file, the agent MUST be out of plan mode. Sub-skills (`brainstorming`, `test-driven-development`, `verification-before-completion`, `finishing-a-development-branch`, `using-git-worktrees`) MUST be invoked through the `Skill` tool explicitly — no shell wrappers.
## Metadata
| Field | Value |
|-------|-------|
| Created | YYYY-MM-DD |
| Slug | YYYY-MM-DD-<slug> |
| Runtime | claude-code |
| Reviewer CLI | codex \| claude \| cursor \| opencode |
| Reviewer Model | <model> |
| MAX_ROUNDS | 10 |
| Branch Strategy | current-branch \| worktree |
| Branch Name | <current branch name, or new branch name when worktree is used> |
| Worktree Path | <absolute path to worktree dir; blank when Branch Strategy = current-branch> |
| Status | draft |
### Status Enum (authoritative)
| Value | Meaning |
|-------|---------|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
---
## Prompt
<!-- Exact user prompt, verbatim. -->
## Interpretation
<!-- Short restatement of goal + out-of-scope items. -->
## Assumptions
<!-- Anything we're assuming and needs confirmation. Empty list OK after clarifying questions. -->
## Files
<!-- Files expected to be created / modified / deleted. Paths are absolute or repo-relative. -->
| Action | Path | Why |
|--------|------|-----|
| | | |
## Approach
<!-- 3-10 bullets describing implementation order. -->
## TDD Approach
<!-- One of:
(a) **TDD applies** — list the failing test(s) to write first, then implementation, then confirm green.
(b) **TDD auto-skipped** — reason must be exactly one of:
- `pure-documentation`
- `pure-comment-whitespace-rename`
(c) **TDD user-approved skip** — user explicitly approved skipping TDD for this task.
Record the approval timestamp (ISO-8601) and the specific reason (e.g., `pure-config-addition`).
-->
## Acceptance Criteria
- [ ] <criterion 1>
- [ ] <criterion 2>
## Verification
<!-- Commands to run:
lint: <cmd>
typecheck: <cmd>
tests: <cmd>
-->
## Rollback
<!-- How to undo: `git revert <hash>`, or manual steps if the change is not easily revertable. -->
---
## Runtime State
<!-- Updated by the skill at runtime. Used to detect resume and to persist reviewer session IDs across rounds. -->
```yaml
plan_review_round: 0
implementation_review_round: 0
CODEX_PLAN_SESSION_ID:
CODEX_IMPL_SESSION_ID:
CURSOR_PLAN_SESSION_ID:
CURSOR_IMPL_SESSION_ID:
OPENCODE_PLAN_SESSION_ID:
OPENCODE_IMPL_SESSION_ID:
last_phase_entered:
last_round_ts:
last_scan_outcome_plan:
last_scan_outcome_impl:
verification_attempts: 0
tests_added_count: 0
tdd_used: false
```
## Review History
<!-- Append one entry per reviewer round, both loops. -->
| Timestamp (ISO-8601) | Loop | Round | Verdict | Summary |
|----------------------|------|-------|---------|---------|
| | | | | |
## Final Status
<!-- Filled at the terminal outcome (phase 9/10). Populate at least:
- Terminal status (one of the 10 Status enum values)
- Commit hash (if any)
- Plan-review rounds used / MAX_ROUNDS
- Implementation-review rounds used / MAX_ROUNDS
- TDD used (true|false)
- Tests added count
- Verification attempts used
- Last round ISO-8601 timestamp
- Notes (anything the user should know when revisiting)
-->
---
## Guardrails (do NOT remove)
- This file is the single persistent artifact for `do-task`. Do not split it or delete it on success.
- `Status` must always match one of the 10 enum values.
- `Runtime State` is updated by the skill, not by the user.
- Review History is append-only.
- `last_scan_outcome_plan` and `last_scan_outcome_impl` record the most recent secret-scan result for each loop. They are informational; the scan itself runs per-payload with no caching.
+800
View File
@@ -0,0 +1,800 @@
---
name: do-task
description: Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review) in Codex. ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan).
---
# Do Task (Codex Native Superpowers)
Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify.
This is a single-artifact sibling of `create-plan` + `implement-plan`. Unlike `implement-plan`, `do-task` operates on one persistent `task-plan.md` (not a full milestone plan) and defaults to the **current branch** (not a worktree).
**Core principle:** Codex uses native skill discovery from `~/.agents/skills/`. Do not use deprecated `superpowers-codex bootstrap` or `use-skill` CLI commands.
## Prerequisite Check (MANDATORY)
Required:
- Codex CLI: `codex --version`
- Superpowers repo: `https://github.com/obra/superpowers`
- Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- `superpowers:brainstorming`
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
- Shared reviewer runtime: `~/.codex/skills/reviewer-runtime/run-review.sh`
- Telegram notifier helper: `~/.codex/skills/reviewer-runtime/notify-telegram.sh`
Verify before proceeding:
```bash
test -L ~/.agents/skills/superpowers
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
```
If any required dependency is missing, stop immediately and return:
`Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
## Required Skill Invocation Rules
- Invoke relevant skills through native discovery (no CLI wrapper).
- Announce skill usage explicitly:
- `I've read the [Skill Name] skill and I'm using it to [purpose].`
- For skills with checklists, track checklist items with `update_plan` todos.
- Tool mapping for Codex:
- `TodoWrite` -> `update_plan`
- `Task` subagents -> unavailable in Codex; do the work directly and state the limitation
- `Skill` -> use native skill discovery from `~/.agents/skills/`
## Trigger Phrase Detection
**Binding triggers** (always invoke this skill):
- `/do-task`
- "do this task"
- "do task ..."
- "execute this task"
- "make it so"
**Hint trigger** (invoke unless context clearly maps to another skill):
- "just do ..."
**Escape phrases** (skip the Phase 2 clarifying-question loop):
- `--no-questions`
- `"just do it:"`
- `"just do this:"`
- `"no questions:"`
**Excluded** (do NOT trigger `do-task`):
- "implement this" — reserved for `implement-plan`.
**Dropped defaults** (explicitly NOT binding triggers):
- "work on ..."
- "handle this"
- "take care of ..."
- "get this done"
**Worktree opt-in phrases** (Phase 4 takes the worktree branch):
- "in a worktree"
- "use a worktree"
- "on an isolated branch"
- "on a new branch called X"
## Process
### Phase 1: Preflight
1. Verify git repo: `git rev-parse --is-inside-work-tree`.
2. Verify `/ai_plan/` is present in `.gitignore`. If missing:
- Append `/ai_plan/` to `.gitignore`.
- Commit that infra change immediately with message `chore(gitignore): ignore ai_plan local planning artifacts`.
- This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up `aborted` or `failed`.
3. Verify required sub-skills are discoverable under `~/.agents/skills/superpowers/`. Announce each sub-skill before invocation using:
`I've read the [Skill Name] skill and I'm using it to [purpose].`
### Phase 2: Parse Prompt and Question
1. Capture the exact user prompt verbatim.
2. Detect trigger phrase (see above) and record which one matched.
3. Detect escape phrase. If set, skip clarifying questions entirely.
4. Apply the ask-first heuristic:
- Skip clarifying questions ONLY if ALL are true:
- Prompt names a concrete target (file, feature, or function).
- Prompt names a concrete outcome (what success looks like).
- Prompt has no ambiguous scope (no "and maybe also ...").
- All identifiers in the prompt are resolvable against the codebase.
- Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred.
- Empty prompt → ask exactly once: "what task?".
5. Invoke `superpowers:brainstorming` via native discovery (`~/.agents/skills/superpowers/brainstorming/SKILL.md`) for any **behavior-changing** task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: `pure-documentation` and `pure-comment-whitespace-rename`. When skipping, record the skip reason in the Interpretation section of `task-plan.md`.
### Phase 3: Configure Reviewer
If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with `REVIEWER_CLI=codex`, `REVIEWER_MODEL=gpt-5.4`, and `MAX_ROUNDS=10`. Otherwise, ask:
1. **Which CLI should review both the plan and the implementation?**
- `codex` — OpenAI Codex CLI (`codex exec`)
- `claude` — Claude Code CLI (`claude -p`)
- `cursor` — Cursor Agent CLI (`cursor-agent -p`)
- `opencode` — OpenCode CLI (`opencode run`)
- `skip` — No external review, proceed with user approval only at each loop.
2. **Which model?** (only if a CLI was chosen)
- For `codex`: default `gpt-5.4`, alternatives: `gpt-5.3-codex`, `o4-mini`, `o3`.
- For `claude`: default `sonnet`, alternatives: `opus`, `haiku`.
- For `cursor`: **run `cursor-agent models` first** to see available models.
- For `opencode`: provider-qualified form `<provider>/<model>` (e.g., `anthropic/claude-sonnet-4-5`, `openai/gpt-5.4`). Run `opencode models` to list available models.
- Accept any model string the user provides.
3. **Max review rounds shared across both loops?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phases 5 and 8.
### Phase 4: Initialize Plan Workspace
Codex has no plan-mode concept; there is no plan-mode guard here.
Steps:
1. Compute slug: `YYYY-MM-DD-<slug>` where `<slug>` is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only).
2. Compute plan folder: `ai_plan/<slug>/`.
3. **Resume detection:** If the folder already exists, read `task-plan.md`:
- If `Status` is `draft` or `plan-approved` or `implementation-in-progress`: offer to resume, pick a new suffix (`<slug>-v2`), or abort. Default is resume.
- If `Status` is any terminal value (`pushed`, `local-only`, `aborted-*`, `failed`): offer a new suffix or abort. Default is new suffix.
4. If not resuming, create the folder and write `task-plan.md` from the template at `templates/task-plan.md` (this skill's template folder; falls back to `~/.codex/skills/do-task/templates/task-plan.md` when installed directly).
5. Fill in:
- `Metadata` block.
- `Prompt` (verbatim).
- `Interpretation`, `Assumptions`, `Files`, `Approach`, `TDD Approach`, `Acceptance Criteria`, `Verification`, `Rollback`.
- Leave `Runtime State`, `Review History`, `Final Status` empty (skill updates these).
6. Set `Status: draft`.
**Worktree branch:** If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke `superpowers:using-git-worktrees` via native discovery before proceeding. Otherwise continue on the current branch.
### Phase 5: Plan Review Loop
If `REVIEWER_CLI=skip`, present `task-plan.md` to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = plan
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
PAYLOAD_PATH = /tmp/do-task-plan-${REVIEW_ID}.md
PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID
```
Payload is the current `task-plan.md` **with the `Runtime State` and `Review History` blocks stripped** before writing to `PAYLOAD_PATH`. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections.
`PLAN_REVIEW_PROMPT`:
```text
Review this task plan for completeness, correctness, and risk. Focus on:
1. Does the plan match the user's prompt?
2. Are all assumptions surfaced?
3. Are acceptance criteria testable?
4. Is the TDD approach appropriate per the TDD Approach section?
5. Are there missing files, risks, or security concerns?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: plan-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 6.
On MAX_ROUNDS:
- Set `Status: aborted-plan-review`.
- Send Telegram summary before stopping.
- Ask the user whether to override and proceed, restart, or abort.
### Phase 6: Execute (TDD-first where applicable)
Native orchestration — do not invoke `superpowers:executing-plans`.
1. Set `Status: implementation-in-progress`.
2. For every behavior-changing file edit:
- Invoke `superpowers:test-driven-development` via native discovery.
- Write the failing test first. Run it. Confirm it fails.
- Implement the minimal code to make it pass. Run the test. Confirm green.
- Do NOT commit yet — a single task commit happens in Phase 9.
3. Auto-skip of TDD is permitted ONLY for tasks classified in `task-plan.md` TDD Approach as:
- `pure-documentation`
- `pure-comment-whitespace-rename`
4. Any other skip (including `pure-config-addition`) requires explicit user approval recorded in `task-plan.md` with an ISO-8601 timestamp.
5. Update `task-plan.md` after each logical step: add notes to `Approach`, check off `Acceptance Criteria` items as they complete.
### Phase 7: Verification Gate
Invoke `superpowers:verification-before-completion` via native discovery.
Run the commands listed in the `Verification` section of `task-plan.md`:
- Lint (changed files first).
- Typecheck.
- Tests (targeted first, then broader suite if quick).
All must pass. If a command fails:
- Fix the issue.
- Re-run that command.
- Increment `verification_attempts` in Runtime State.
If `verification_attempts` exceeds 3 without green:
- Set `Status: aborted-verification`.
- Send Telegram summary.
- Ask the user whether to retry, override, or abort.
### Phase 8: Implementation Review Loop
If `REVIEWER_CLI=skip`, present a diff + verification summary to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = implementation
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) # distinct from plan-review ID
PAYLOAD_PATH = /tmp/do-task-implementation-${REVIEW_ID}.md
PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID
```
Payload contents (assembled by the skill):
```markdown
# Implementation Review: [Short Title]
## Task Plan (the plan that was approved)
<embed approved task-plan.md, excluding Runtime State block>
## Changes Made (git diff)
<output of: `git diff` for unstaged + `git diff --staged` for staged>
## Verification Output
### Lint
<lint output>
### Typecheck
<typecheck output>
### Tests
<test output, pass/fail counts>
```
`IMPL_REVIEW_PROMPT`:
```text
Review this implementation against the task plan. Focus on:
1. Correctness — Does the diff satisfy the Acceptance Criteria?
2. Code quality — Clean, maintainable, no obvious issues?
3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)?
4. Security — Any security concerns introduced?
5. Regressions — Does the diff risk breaking unrelated code?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: implementation-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 9.
On MAX_ROUNDS:
- Set `Status: aborted-impl-review`.
- Send Telegram summary.
- Ask the user whether to override and commit anyway, restart, or abort.
### Phase 9: Commit + Push Ask
Invoke `superpowers:finishing-a-development-branch` via native discovery.
1. Stage all changed files explicitly (avoid `git add -A`).
2. Single commit with message derived from the task goal:
- Format: `<type>(<scope>): <short description>`
- Example: `feat(auth): add session token rotation`
3. Do NOT push. Update `Status: local-only`.
4. Ask the user: "Push to remote? (yes / no)"
- On explicit `yes` → push, then set `Status: pushed`.
- Any other response → leave `Status: local-only`.
### Phase 10: Telegram Notification + Finalize
Resolve the notifier helper:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.codex/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome (`pushed`, `local-only`, `aborted-*`, `failed`), send a Telegram summary if both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are set:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "do-task <slug>: <status summary>"
fi
```
Rules:
- Telegram is the only supported notification path.
- Notification failures are non-blocking but must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
- If Telegram is not configured, state that no Telegram notification was sent.
Fill in `Final Status` in `task-plan.md` (include commit hash if any). Do NOT delete the plan folder — it stays as a record.
---
## Review Loop (Shared Subroutine)
This subroutine is invoked twice per `do-task` run: once in Phase 5 (`REVIEW_KIND=plan`) and once in Phase 8 (`REVIEW_KIND=implementation`). Separate session IDs are used for each loop so reviewer context never leaks across loops.
### Subroutine Inputs
| Variable | Purpose |
|----------|---------|
| `REVIEW_KIND` | `plan` or `implementation` |
| `REVIEW_ID` | 8-char hex (from `uuidgen`); reused across rounds of the same loop |
| `PAYLOAD_PATH` | `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` |
| `PROMPT_TEMPLATE` | `PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT` |
| `REVIEWER_CLI` | `codex` \| `claude` \| `cursor` \| `opencode` |
| `REVIEWER_MODEL` | Model name |
| `MAX_ROUNDS` | Default 10 |
| `SESSION_ID_VAR` | `CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID` |
Temp artifact paths (per loop):
- `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` — payload
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md` — normalized review text
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json` — raw Cursor/OpenCode JSON (cursor only, plus opencode when `--format json` is used)
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh`
Resolve the shared helper:
```bash
REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh
```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
;;
esac
```
### Step 1: Write Payload
Write the full payload for this round to `PAYLOAD_PATH`.
### Step 1a: Secret Scan (per-payload, no caching)
**BEFORE** sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw.
Run the secret scan with all of these anchored regexes. Use `grep -En` on the payload file:
```bash
SECRET_REGEX_FILE=$(mktemp)
cat >"$SECRET_REGEX_FILE" <<'EOF'
AKIA[0-9A-Z]{16}
"type"\s*:\s*"service_account"
(ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
sk-(proj-)?[A-Za-z0-9_-]{20,}
sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
-----BEGIN [A-Z ]+ PRIVATE KEY-----
(TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
EOF
SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true)
rm -f "$SECRET_REGEX_FILE"
```
If `SCAN_MATCHES` is non-empty:
1. **Redact the matched text before surfacing** — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: `[REDACTED:<pattern-label>:<match-length>-chars]`. Example: a matched AWS key becomes `[REDACTED:aws-access-key:20-chars]`. Keep the file path and line number; they are useful for the user and not secret.
2. Present the redacted match summary to the user using this exact wording:
```
SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N):
<file>:<line>: [REDACTED:<pattern-label>:<match-length>-chars]
...
Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact)
```
Pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`.
2. Wait for user response.
3. On `yes`: record `last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches` in Runtime State, and proceed.
4. On `redact`: ask the user to supply redactions, apply them to `PAYLOAD_PATH`, re-scan (this step), record `last_scan_outcome_${REVIEW_KIND}=redacted-and-approved`.
5. On `no`: stop the loop, set `Status: failed`, send Telegram, return to the user.
If `SCAN_MATCHES` is empty, record `last_scan_outcome_${REVIEW_KIND}=clean` and proceed.
### Step 2: Generate Reviewer Command Script
Write the reviewer invocation to `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` as a bash script starting with:
```bash
#!/usr/bin/env bash
set -euo pipefail
```
**If `REVIEWER_CLI` is `codex`:**
Round 1 — fresh `codex exec`:
```bash
codex exec \
-m ${REVIEWER_MODEL} \
-s read-only \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
${PROMPT_TEMPLATE}"
```
Do not capture the Codex session ID yet. After Round 1 completes, extract it from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` (look for `session id: <uuid>`) and persist it to Runtime State under `${SESSION_ID_VAR}`.
Round 2 and later — resume session:
```bash
codex exec resume ${SESSION_ID_VAR_VALUE} \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before.
Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain."
```
If resume fails, fall back to fresh `codex exec` with prior-round context.
**If `REVIEWER_CLI` is `claude`:**
Fresh call every round (Claude CLI has no session resume):
```bash
claude -p \
"${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
${PROMPT_TEMPLATE}" \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
```
Where `${ROUND_PREFIX}` is empty for Round 1 and `"You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. "` for subsequent rounds.
**If `REVIEWER_CLI` is `cursor`:**
Round 1:
```bash
cursor-agent -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later — resume:
```bash
cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
If resume fails, fall back to fresh `cursor-agent -p`.
After the command completes, extract the session id and review text:
```bash
CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
Persist `CURSOR_SID` to Runtime State under `${SESSION_ID_VAR}` on Round 1.
**If `REVIEWER_CLI` is `opencode`:**
OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in `plan` primary agent (`--agent plan`) for review, which is read-oriented and does not modify files. Session resume is supported via `-s <session-id>`, but the most reliable pattern for non-interactive review is **fresh call each round** (like `claude`) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in `--format json` output.
Round 1 (preferred, fresh call):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later (fresh-call fallback path — recommended default):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"You previously reviewed this ${REVIEW_KIND} and requested revisions.
Previous feedback summary: [key points from last review]
I've revised. Updated payload is below.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Optional session-resume path (only if the installed opencode reliably emits a session id in `--format json` output and accepts it back via `-s`):
```bash
# Round 2+ with resume
opencode run \
-s ${SESSION_ID_VAR_VALUE} \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Extract the review body (the JSON stream emits events; the final assistant message contains the review text):
```bash
jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
|| cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as `completed-empty-output` and follow the helper-failure escalation in Step 6.
### Step 3: Run via `run-review.sh`
Run the command script through the shared helper when available:
```bash
if [ -x "$REVIEWER_RUNTIME" ]; then
"$REVIEWER_RUNTIME" \
--command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
--stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
--status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
fi
```
Run the helper in the foreground and watch live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the `.status` file instead of treating heartbeats as post-hoc-only data.
### Step 4: Promote Reviewer Output + Capture Session ID
After the command completes:
- `cursor`: already promoted in Step 2 via `jq -r '.result' ...`. Also capture `session_id` if first round.
- `codex`: extract `CODEX_SESSION_ID` from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text lives only in `.runner.out`, `cp` it into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `claude`: promote `.runner.out` into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `opencode`: already promoted in Step 2 via `jq` on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to `${SESSION_ID_VAR}`.
On Round 1, persist the captured session ID (if any) into `task-plan.md`'s Runtime State under `${SESSION_ID_VAR}`.
### Step 5: Parse Verdict + Update Review History
1. Read `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md`.
2. Append one row to `task-plan.md` Review History:
- Timestamp (ISO-8601 UTC).
- Loop (`plan` or `implementation`).
- Round number.
- Verdict (`APPROVED` or `REVISE`).
- Summary (first line of the `## Summary` section).
3. Increment `plan_review_round` or `implementation_review_round` in Runtime State.
### Step 6: Branch APPROVED / REVISE / MAX_ROUNDS
Verdict rules:
- **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → exit the subroutine with `APPROVED`.
- **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if cheap and safe, then exit with `APPROVED`.
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to revision (see below), then return to Step 1 for the next round.
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as APPROVED.
- Helper state `completed-empty-output` → treat as failed review attempt, surface `.stderr`/`.status`, fix invocation or prompt handling, then retry.
- Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters.
- Round counter ≥ `MAX_ROUNDS` → exit the subroutine with `MAX_ROUNDS`. Caller decides next action per Phase 5 or Phase 8.
**Revision:** The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (`P0` → `P1` → `P2` → `P3`). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1.
### Step 7: Liveness Contract (during Step 3)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- Keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
### Step 8: Cleanup (on successful round exit)
```bash
rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh
```
If the round failed, produced empty output, or reached operator-decision timeout, KEEP `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them.
---
## Resume Semantics
1. Detect existing plan folder by slug at Phase 4.
2. Read `task-plan.md` → `Status`.
3. Decide next action:
| Status | Action |
|--------|--------|
| `draft` | Resume at Phase 5 (plan review) |
| `plan-approved` | Resume at Phase 6 (execute) |
| `implementation-in-progress` | Resume at Phase 6 (continue execute) |
| `implementation-approved` | Resume at Phase 9 (commit + push ask) |
| `pushed` \| `local-only` | Ask user: new suffix, abort, or replay for reference only |
| `aborted-*` \| `failed` | Offer new suffix or full restart |
4. When resuming, read Runtime State for `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via `codex exec resume`, `cursor-agent --resume`, or `opencode run -s <id>` as applicable.
---
## Tracker Discipline (MANDATORY)
**ALWAYS update `task-plan.md` before/after each phase transition. NEVER proceed with stale state.**
Before starting any phase:
1. Update `Status` if it transitions.
2. Update `last_phase_entered` in Runtime State.
After completing any phase:
1. Update `Status` if it transitions.
2. Append notes to the relevant section of `task-plan.md`.
Review History is append-only.
---
## Execution Workflow Rules
- Current branch is the default; worktree is opt-in only.
- Do NOT push without explicit "yes".
- Secret scan runs **per-payload, no caching** — every round, including revisions.
- Review loops use `MAX_ROUNDS=10` by default, shared across both loops.
- The task commit is a single commit created in Phase 9; interim WIP commits are NOT created.
- The `.gitignore` infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort.
---
## Verification Checklist
- [ ] `ai_plan/` exists and `/ai_plan/` is in `.gitignore`
- [ ] `task-plan.md` created under `ai_plan/YYYY-MM-DD-<slug>/`
- [ ] Reviewer CLI + model + `MAX_ROUNDS` configured (or `skip`)
- [ ] Secret scan ran on every outbound reviewer payload
- [ ] Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Phase 6 executed TDD-first for all behavior-changing steps (or documented skip)
- [ ] Phase 7 verification green before Phase 8
- [ ] Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Single task commit created locally, no push without explicit yes
- [ ] Telegram notification attempted if configured
- [ ] `task-plan.md` Final Status filled in
---
## Variant Hardening Notes — Codex
- Must use native skill discovery from `~/.agents/skills/` (no CLI wrappers).
- Must verify Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
- Must invoke required sub-skills with explicit announcements before any action.
- Must track checklist-driven sub-skills with `update_plan` todos (the Codex equivalent of `TodoWrite`).
- `Task` subagents are unavailable in Codex — do the work directly and state the limitation in the plan if a subagent was expected.
- Deprecated CLI commands (`superpowers-codex bootstrap`, `use-skill`) must NOT be used.
- Helper paths are `~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`.
- No plan-mode guard (Codex has no plan-mode concept).
## Common Mistakes
- Using deprecated commands like `superpowers-codex bootstrap` or `superpowers-codex use-skill`.
- Jumping to Phase 6 execution without running `superpowers:brainstorming` on behavior-changing work.
- Asking multiple clarifying questions in a single message.
- Forgetting `update_plan` tracking for sub-skill checklists.
- Forgetting to create/update `.gitignore` for `/ai_plan/`.
- Skipping the per-payload secret scan because "the previous round was clean".
- Pushing the task commit without explicit user approval.
## Red Flags — Stop and Correct
- You are about to run any `superpowers-codex` command.
- You are writing `task-plan.md` before the brainstorming step (when required) completes.
- You are pushing commits without user approval.
- You did not announce which skill you invoked and why.
- You are proceeding to implementation review with failing lint/typecheck/tests.
- You are echoing raw secret-scan matches to the user or logs.
+144
View File
@@ -0,0 +1,144 @@
# Task Plan: [Short Title]
> **Variant guardrail (Codex):** Sub-skills (`brainstorming`, `test-driven-development`, `verification-before-completion`, `finishing-a-development-branch`, `using-git-worktrees`) MUST be invoked through native skill discovery from `~/.agents/skills/superpowers/<skill>/SKILL.md` — no `superpowers-codex` CLI wrappers. Checklist-driven sub-skills MUST track items with `update_plan` todos.
## Metadata
| Field | Value |
|-------|-------|
| Created | YYYY-MM-DD |
| Slug | YYYY-MM-DD-<slug> |
| Runtime | codex |
| Reviewer CLI | codex \| claude \| cursor \| opencode |
| Reviewer Model | <model> |
| MAX_ROUNDS | 10 |
| Branch Strategy | current-branch \| worktree |
| Branch Name | <current branch name, or new branch name when worktree is used> |
| Worktree Path | <absolute path to worktree dir; blank when Branch Strategy = current-branch> |
| Status | draft |
### Status Enum (authoritative)
| Value | Meaning |
|-------|---------|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
---
## Prompt
<!-- Exact user prompt, verbatim. -->
## Interpretation
<!-- Short restatement of goal + out-of-scope items. -->
## Assumptions
<!-- Anything we're assuming and needs confirmation. Empty list OK after clarifying questions. -->
## Files
<!-- Files expected to be created / modified / deleted. Paths are absolute or repo-relative. -->
| Action | Path | Why |
|--------|------|-----|
| | | |
## Approach
<!-- 3-10 bullets describing implementation order. -->
## TDD Approach
<!-- One of:
(a) **TDD applies** — list the failing test(s) to write first, then implementation, then confirm green.
(b) **TDD auto-skipped** — reason must be exactly one of:
- `pure-documentation`
- `pure-comment-whitespace-rename`
(c) **TDD user-approved skip** — user explicitly approved skipping TDD for this task.
Record the approval timestamp (ISO-8601) and the specific reason (e.g., `pure-config-addition`).
-->
## Acceptance Criteria
- [ ] <criterion 1>
- [ ] <criterion 2>
## Verification
<!-- Commands to run:
lint: <cmd>
typecheck: <cmd>
tests: <cmd>
-->
## Rollback
<!-- How to undo: `git revert <hash>`, or manual steps if the change is not easily revertable. -->
---
## Runtime State
<!-- Updated by the skill at runtime. Used to detect resume and to persist reviewer session IDs across rounds. -->
```yaml
plan_review_round: 0
implementation_review_round: 0
CODEX_PLAN_SESSION_ID:
CODEX_IMPL_SESSION_ID:
CURSOR_PLAN_SESSION_ID:
CURSOR_IMPL_SESSION_ID:
OPENCODE_PLAN_SESSION_ID:
OPENCODE_IMPL_SESSION_ID:
last_phase_entered:
last_round_ts:
last_scan_outcome_plan:
last_scan_outcome_impl:
verification_attempts: 0
tests_added_count: 0
tdd_used: false
```
## Review History
<!-- Append one entry per reviewer round, both loops. -->
| Timestamp (ISO-8601) | Loop | Round | Verdict | Summary |
|----------------------|------|-------|---------|---------|
| | | | | |
## Final Status
<!-- Filled at the terminal outcome (phase 9/10). Populate at least:
- Terminal status (one of the 10 Status enum values)
- Commit hash (if any)
- Plan-review rounds used / MAX_ROUNDS
- Implementation-review rounds used / MAX_ROUNDS
- TDD used (true|false)
- Tests added count
- Verification attempts used
- Last round ISO-8601 timestamp
- Notes (anything the user should know when revisiting)
-->
---
## Guardrails (do NOT remove)
- This file is the single persistent artifact for `do-task`. Do not split it or delete it on success.
- `Status` must always match one of the 10 enum values.
- `Runtime State` is updated by the skill, not by the user.
- Review History is append-only.
- `last_scan_outcome_plan` and `last_scan_outcome_impl` record the most recent secret-scan result for each loop. They are informational; the scan itself runs per-payload with no caching.
+805
View File
@@ -0,0 +1,805 @@
---
name: do-task
description: Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review) in Cursor Agent CLI. ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan).
---
# Do Task (Cursor Agent CLI)
Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify.
This is a single-artifact sibling of `create-plan` + `implement-plan`. Unlike `implement-plan`, `do-task` operates on one persistent `task-plan.md` (not a full milestone plan) and defaults to the **current branch** (not a worktree).
**Core principle:** Cursor Agent CLI discovers skills from `.cursor/skills/` (repo-local) or `~/.cursor/skills/` (global). It also reads `AGENTS.md` at the repo root for additional instructions.
## Prerequisite Check (MANDATORY)
Required:
- Cursor Agent CLI: `cursor-agent --version` (install via `curl https://cursor.com/install -fsS | bash`). Binary is `cursor-agent`; the alias `cursor agent` also works.
- `jq` (**required** — `do-task` always parses JSON output from at least the cursor reviewer branch, and other reviewers may produce JSON). Install via `brew install jq` (macOS) or your package manager. Verify: `jq --version`.
- Superpowers repo: `https://github.com/obra/superpowers`
- Superpowers skills installed under `.cursor/skills/` (repo-local) or `~/.cursor/skills/` (global)
- `superpowers:brainstorming`
- `superpowers:test-driven-development`
- `superpowers:verification-before-completion`
- `superpowers:finishing-a-development-branch`
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
- Shared reviewer runtime: `.cursor/skills/reviewer-runtime/run-review.sh` (repo-local, preferred) OR `~/.cursor/skills/reviewer-runtime/run-review.sh` (global fallback)
- Telegram notifier helper: `.cursor/skills/reviewer-runtime/notify-telegram.sh` OR `~/.cursor/skills/reviewer-runtime/notify-telegram.sh`
Verify before proceeding:
```bash
cursor-agent --version
jq --version
test -f .cursor/skills/superpowers/skills/brainstorming/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/brainstorming/SKILL.md
test -f .cursor/skills/superpowers/skills/test-driven-development/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/test-driven-development/SKILL.md
test -f .cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/verification-before-completion/SKILL.md
test -f .cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md
```
If any required dependency is missing, stop immediately and return:
`Missing dependency: [specific missing item]. Install Cursor Agent CLI, jq, and Superpowers skills under .cursor/skills/ or ~/.cursor/skills/, then retry.`
## Required Skill Invocation Rules
- Invoke relevant skills through workspace discovery (`.cursor/skills/` repo-local or `~/.cursor/skills/` global).
- Announce skill usage explicitly:
- `I've read the [Skill Name] skill and I'm using it to [purpose].`
- For skills with checklists, track checklist items explicitly in conversation.
- The reviewer CLI branch for `cursor` MUST use `--mode=ask --trust --output-format json`. Never use `--mode=agent` or `--force` for reviewer calls.
## Trigger Phrase Detection
**Binding triggers** (always invoke this skill):
- `/do-task`
- "do this task"
- "do task ..."
- "execute this task"
- "make it so"
**Hint trigger** (invoke unless context clearly maps to another skill):
- "just do ..."
**Escape phrases** (skip the Phase 2 clarifying-question loop):
- `--no-questions`
- `"just do it:"`
- `"just do this:"`
- `"no questions:"`
**Excluded** (do NOT trigger `do-task`):
- "implement this" — reserved for `implement-plan`.
**Dropped defaults** (explicitly NOT binding triggers):
- "work on ..."
- "handle this"
- "take care of ..."
- "get this done"
**Worktree opt-in phrases** (Phase 4 takes the worktree branch):
- "in a worktree"
- "use a worktree"
- "on an isolated branch"
- "on a new branch called X"
## Process
### Phase 1: Preflight
1. Verify git repo: `git rev-parse --is-inside-work-tree`.
2. Verify `/ai_plan/` is present in `.gitignore`. If missing:
- Append `/ai_plan/` to `.gitignore`.
- Commit that infra change immediately with message `chore(gitignore): ignore ai_plan local planning artifacts`.
- This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up `aborted` or `failed`.
3. Verify required sub-skills are discoverable under `.cursor/skills/superpowers/skills/` or `~/.cursor/skills/superpowers/skills/`. Announce each sub-skill before invocation using:
`I've read the [Skill Name] skill and I'm using it to [purpose].`
### Phase 2: Parse Prompt and Question
1. Capture the exact user prompt verbatim.
2. Detect trigger phrase (see above) and record which one matched.
3. Detect escape phrase. If set, skip clarifying questions entirely.
4. Apply the ask-first heuristic:
- Skip clarifying questions ONLY if ALL are true:
- Prompt names a concrete target (file, feature, or function).
- Prompt names a concrete outcome (what success looks like).
- Prompt has no ambiguous scope (no "and maybe also ...").
- All identifiers in the prompt are resolvable against the codebase.
- Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred.
- Empty prompt → ask exactly once: "what task?".
5. Invoke `superpowers:brainstorming` via workspace discovery (`.cursor/skills/superpowers/skills/brainstorming/SKILL.md`) for any **behavior-changing** task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: `pure-documentation` and `pure-comment-whitespace-rename`. When skipping, record the skip reason in the Interpretation section of `task-plan.md`.
### Phase 3: Configure Reviewer
If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with `REVIEWER_CLI=codex`, `REVIEWER_MODEL=gpt-5.4`, and `MAX_ROUNDS=10`. Otherwise, ask:
1. **Which CLI should review both the plan and the implementation?**
- `codex` — OpenAI Codex CLI (`codex exec`)
- `claude` — Claude Code CLI (`claude -p`)
- `cursor` — Cursor Agent CLI (`cursor-agent -p`)
- `opencode` — OpenCode CLI (`opencode run`)
- `skip` — No external review, proceed with user approval only at each loop.
2. **Which model?** (only if a CLI was chosen)
- For `codex`: default `gpt-5.4`, alternatives: `gpt-5.3-codex`, `o4-mini`, `o3`.
- For `claude`: default `sonnet`, alternatives: `opus`, `haiku`.
- For `cursor`: **run `cursor-agent models` first** to see available models.
- For `opencode`: provider-qualified form `<provider>/<model>` (e.g., `anthropic/claude-sonnet-4-5`, `openai/gpt-5.4`). Run `opencode models` to list available models.
- Accept any model string the user provides.
3. **Max review rounds shared across both loops?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phases 5 and 8.
### Phase 4: Initialize Plan Workspace
Cursor Agent CLI has no plan-mode concept; there is no plan-mode guard here.
Steps:
1. Compute slug: `YYYY-MM-DD-<slug>` where `<slug>` is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only).
2. Compute plan folder: `ai_plan/<slug>/`.
3. **Resume detection:** If the folder already exists, read `task-plan.md`:
- If `Status` is `draft` or `plan-approved` or `implementation-in-progress`: offer to resume, pick a new suffix (`<slug>-v2`), or abort. Default is resume.
- If `Status` is any terminal value (`pushed`, `local-only`, `aborted-*`, `failed`): offer a new suffix or abort. Default is new suffix.
4. If not resuming, create the folder and write `task-plan.md` from the template at `templates/task-plan.md` (this skill's template folder; resolves to `.cursor/skills/do-task/templates/task-plan.md` repo-local or `~/.cursor/skills/do-task/templates/task-plan.md` global when installed directly).
5. Fill in:
- `Metadata` block.
- `Prompt` (verbatim).
- `Interpretation`, `Assumptions`, `Files`, `Approach`, `TDD Approach`, `Acceptance Criteria`, `Verification`, `Rollback`.
- Leave `Runtime State`, `Review History`, `Final Status` empty (skill updates these).
6. Set `Status: draft`.
**Worktree branch:** If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke `superpowers:using-git-worktrees` via workspace discovery before proceeding. Otherwise continue on the current branch.
### Phase 5: Plan Review Loop
If `REVIEWER_CLI=skip`, present `task-plan.md` to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = plan
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
PAYLOAD_PATH = /tmp/do-task-plan-${REVIEW_ID}.md
PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID
```
Payload is the current `task-plan.md` **with the `Runtime State` and `Review History` blocks stripped** before writing to `PAYLOAD_PATH`. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections.
`PLAN_REVIEW_PROMPT`:
```text
Review this task plan for completeness, correctness, and risk. Focus on:
1. Does the plan match the user's prompt?
2. Are all assumptions surfaced?
3. Are acceptance criteria testable?
4. Is the TDD approach appropriate per the TDD Approach section?
5. Are there missing files, risks, or security concerns?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: plan-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 6.
On MAX_ROUNDS:
- Set `Status: aborted-plan-review`.
- Send Telegram summary before stopping.
- Ask the user whether to override and proceed, restart, or abort.
### Phase 6: Execute (TDD-first where applicable)
Native orchestration — do not invoke `superpowers:executing-plans`.
1. Set `Status: implementation-in-progress`.
2. For every behavior-changing file edit:
- Invoke `superpowers:test-driven-development` via workspace discovery.
- Write the failing test first. Run it. Confirm it fails.
- Implement the minimal code to make it pass. Run the test. Confirm green.
- Do NOT commit yet — a single task commit happens in Phase 9.
3. Auto-skip of TDD is permitted ONLY for tasks classified in `task-plan.md` TDD Approach as:
- `pure-documentation`
- `pure-comment-whitespace-rename`
4. Any other skip (including `pure-config-addition`) requires explicit user approval recorded in `task-plan.md` with an ISO-8601 timestamp.
5. Update `task-plan.md` after each logical step: add notes to `Approach`, check off `Acceptance Criteria` items as they complete.
### Phase 7: Verification Gate
Invoke `superpowers:verification-before-completion` via workspace discovery.
Run the commands listed in the `Verification` section of `task-plan.md`:
- Lint (changed files first).
- Typecheck.
- Tests (targeted first, then broader suite if quick).
All must pass. If a command fails:
- Fix the issue.
- Re-run that command.
- Increment `verification_attempts` in Runtime State.
If `verification_attempts` exceeds 3 without green:
- Set `Status: aborted-verification`.
- Send Telegram summary.
- Ask the user whether to retry, override, or abort.
### Phase 8: Implementation Review Loop
If `REVIEWER_CLI=skip`, present a diff + verification summary to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = implementation
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) # distinct from plan-review ID
PAYLOAD_PATH = /tmp/do-task-implementation-${REVIEW_ID}.md
PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID
```
Payload contents (assembled by the skill):
```markdown
# Implementation Review: [Short Title]
## Task Plan (the plan that was approved)
<embed approved task-plan.md, excluding Runtime State block>
## Changes Made (git diff)
<output of: `git diff` for unstaged + `git diff --staged` for staged>
## Verification Output
### Lint
<lint output>
### Typecheck
<typecheck output>
### Tests
<test output, pass/fail counts>
```
`IMPL_REVIEW_PROMPT`:
```text
Review this implementation against the task plan. Focus on:
1. Correctness — Does the diff satisfy the Acceptance Criteria?
2. Code quality — Clean, maintainable, no obvious issues?
3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)?
4. Security — Any security concerns introduced?
5. Regressions — Does the diff risk breaking unrelated code?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: implementation-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 9.
On MAX_ROUNDS:
- Set `Status: aborted-impl-review`.
- Send Telegram summary.
- Ask the user whether to override and commit anyway, restart, or abort.
### Phase 9: Commit + Push Ask
Invoke `superpowers:finishing-a-development-branch` via workspace discovery.
1. Stage all changed files explicitly (avoid `git add -A`).
2. Single commit with message derived from the task goal:
- Format: `<type>(<scope>): <short description>`
- Example: `feat(auth): add session token rotation`
3. Do NOT push. Update `Status: local-only`.
4. Ask the user: "Push to remote? (yes / no)"
- On explicit `yes` → push, then set `Status: pushed`.
- Any other response → leave `Status: local-only`.
### Phase 10: Telegram Notification + Finalize
Resolve the notifier helper (prefer repo-local, fall back to global):
```bash
if [ -x .cursor/skills/reviewer-runtime/notify-telegram.sh ]; then
TELEGRAM_NOTIFY_RUNTIME=.cursor/skills/reviewer-runtime/notify-telegram.sh
else
TELEGRAM_NOTIFY_RUNTIME=~/.cursor/skills/reviewer-runtime/notify-telegram.sh
fi
```
On every terminal outcome (`pushed`, `local-only`, `aborted-*`, `failed`), send a Telegram summary if both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are set:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "do-task <slug>: <status summary>"
fi
```
Rules:
- Telegram is the only supported notification path.
- Notification failures are non-blocking but must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
- If Telegram is not configured, state that no Telegram notification was sent.
Fill in `Final Status` in `task-plan.md` (include commit hash if any). Do NOT delete the plan folder — it stays as a record.
---
## Review Loop (Shared Subroutine)
This subroutine is invoked twice per `do-task` run: once in Phase 5 (`REVIEW_KIND=plan`) and once in Phase 8 (`REVIEW_KIND=implementation`). Separate session IDs are used for each loop so reviewer context never leaks across loops.
### Subroutine Inputs
| Variable | Purpose |
|----------|---------|
| `REVIEW_KIND` | `plan` or `implementation` |
| `REVIEW_ID` | 8-char hex (from `uuidgen`); reused across rounds of the same loop |
| `PAYLOAD_PATH` | `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` |
| `PROMPT_TEMPLATE` | `PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT` |
| `REVIEWER_CLI` | `codex` \| `claude` \| `cursor` \| `opencode` |
| `REVIEWER_MODEL` | Model name |
| `MAX_ROUNDS` | Default 10 |
| `SESSION_ID_VAR` | `CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID` |
Temp artifact paths (per loop):
- `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` — payload
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md` — normalized review text
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json` — raw Cursor/OpenCode JSON (cursor only, plus opencode when `--format json` is used)
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh`
Resolve the shared helper (prefer repo-local, fall back to global):
```bash
if [ -x .cursor/skills/reviewer-runtime/run-review.sh ]; then
REVIEWER_RUNTIME=.cursor/skills/reviewer-runtime/run-review.sh
else
REVIEWER_RUNTIME=~/.cursor/skills/reviewer-runtime/run-review.sh
fi
```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
;;
esac
```
### Step 1: Write Payload
Write the full payload for this round to `PAYLOAD_PATH`.
### Step 1a: Secret Scan (per-payload, no caching)
**BEFORE** sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw.
Run the secret scan with all of these anchored regexes. Use `grep -En` on the payload file:
```bash
SECRET_REGEX_FILE=$(mktemp)
cat >"$SECRET_REGEX_FILE" <<'EOF'
AKIA[0-9A-Z]{16}
"type"\s*:\s*"service_account"
(ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
sk-(proj-)?[A-Za-z0-9_-]{20,}
sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
-----BEGIN [A-Z ]+ PRIVATE KEY-----
(TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
EOF
SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true)
rm -f "$SECRET_REGEX_FILE"
```
If `SCAN_MATCHES` is non-empty:
1. **Redact the matched text before surfacing** — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: `[REDACTED:<pattern-label>:<match-length>-chars]`. Example: a matched AWS key becomes `[REDACTED:aws-access-key:20-chars]`. Keep the file path and line number; they are useful for the user and not secret.
2. Present the redacted match summary to the user using this exact wording:
```
SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N):
<file>:<line>: [REDACTED:<pattern-label>:<match-length>-chars]
...
Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact)
```
Pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`.
2. Wait for user response.
3. On `yes`: record `last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches` in Runtime State, and proceed.
4. On `redact`: ask the user to supply redactions, apply them to `PAYLOAD_PATH`, re-scan (this step), record `last_scan_outcome_${REVIEW_KIND}=redacted-and-approved`.
5. On `no`: stop the loop, set `Status: failed`, send Telegram, return to the user.
If `SCAN_MATCHES` is empty, record `last_scan_outcome_${REVIEW_KIND}=clean` and proceed.
### Step 2: Generate Reviewer Command Script
Write the reviewer invocation to `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` as a bash script starting with:
```bash
#!/usr/bin/env bash
set -euo pipefail
```
**If `REVIEWER_CLI` is `codex`:**
Round 1 — fresh `codex exec`:
```bash
codex exec \
-m ${REVIEWER_MODEL} \
-s read-only \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
${PROMPT_TEMPLATE}"
```
Do not capture the Codex session ID yet. After Round 1 completes, extract it from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` (look for `session id: <uuid>`) and persist it to Runtime State under `${SESSION_ID_VAR}`.
Round 2 and later — resume session:
```bash
codex exec resume ${SESSION_ID_VAR_VALUE} \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before.
Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain."
```
If resume fails, fall back to fresh `codex exec` with prior-round context.
**If `REVIEWER_CLI` is `claude`:**
Fresh call every round (Claude CLI has no session resume):
```bash
claude -p \
"${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
${PROMPT_TEMPLATE}" \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
```
Where `${ROUND_PREFIX}` is empty for Round 1 and `"You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. "` for subsequent rounds.
**If `REVIEWER_CLI` is `cursor`:**
Round 1:
```bash
cursor-agent -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later — resume:
```bash
cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
If resume fails, fall back to fresh `cursor-agent -p`.
After the command completes, extract the session id and review text:
```bash
CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
Persist `CURSOR_SID` to Runtime State under `${SESSION_ID_VAR}` on Round 1.
**If `REVIEWER_CLI` is `opencode`:**
OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in `plan` primary agent (`--agent plan`) for review, which is read-oriented and does not modify files. Session resume is supported via `-s <session-id>`, but the most reliable pattern for non-interactive review is **fresh call each round** (like `claude`) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in `--format json` output.
Round 1 (preferred, fresh call):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later (fresh-call fallback path — recommended default):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"You previously reviewed this ${REVIEW_KIND} and requested revisions.
Previous feedback summary: [key points from last review]
I've revised. Updated payload is below.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Optional session-resume path (only if the installed opencode reliably emits a session id in `--format json` output and accepts it back via `-s`):
```bash
# Round 2+ with resume
opencode run \
-s ${SESSION_ID_VAR_VALUE} \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Extract the review body (the JSON stream emits events; the final assistant message contains the review text):
```bash
jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
|| cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as `completed-empty-output` and follow the helper-failure escalation in Step 6.
### Step 3: Run via `run-review.sh`
Run the command script through the shared helper when available:
```bash
if [ -x "$REVIEWER_RUNTIME" ]; then
"$REVIEWER_RUNTIME" \
--command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
--stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
--status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
fi
```
Run the helper in the foreground and watch live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the `.status` file instead of treating heartbeats as post-hoc-only data.
### Step 4: Promote Reviewer Output + Capture Session ID
After the command completes:
- `cursor`: already promoted in Step 2 via `jq -r '.result' ...`. Also capture `session_id` if first round.
- `codex`: extract `CODEX_SESSION_ID` from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text lives only in `.runner.out`, `cp` it into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `claude`: promote `.runner.out` into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `opencode`: already promoted in Step 2 via `jq` on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to `${SESSION_ID_VAR}`.
On Round 1, persist the captured session ID (if any) into `task-plan.md`'s Runtime State under `${SESSION_ID_VAR}`.
### Step 5: Parse Verdict + Update Review History
1. Read `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md`.
2. Append one row to `task-plan.md` Review History:
- Timestamp (ISO-8601 UTC).
- Loop (`plan` or `implementation`).
- Round number.
- Verdict (`APPROVED` or `REVISE`).
- Summary (first line of the `## Summary` section).
3. Increment `plan_review_round` or `implementation_review_round` in Runtime State.
### Step 6: Branch APPROVED / REVISE / MAX_ROUNDS
Verdict rules:
- **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → exit the subroutine with `APPROVED`.
- **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if cheap and safe, then exit with `APPROVED`.
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to revision (see below), then return to Step 1 for the next round.
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as APPROVED.
- Helper state `completed-empty-output` → treat as failed review attempt, surface `.stderr`/`.status`, fix invocation or prompt handling, then retry.
- Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters.
- Round counter ≥ `MAX_ROUNDS` → exit the subroutine with `MAX_ROUNDS`. Caller decides next action per Phase 5 or Phase 8.
**Revision:** The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (`P0` → `P1` → `P2` → `P3`). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1.
### Step 7: Liveness Contract (during Step 3)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- Keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
### Step 8: Cleanup (on successful round exit)
```bash
rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh
```
If the round failed, produced empty output, or reached operator-decision timeout, KEEP `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them.
---
## Resume Semantics
1. Detect existing plan folder by slug at Phase 4.
2. Read `task-plan.md` → `Status`.
3. Decide next action:
| Status | Action |
|--------|--------|
| `draft` | Resume at Phase 5 (plan review) |
| `plan-approved` | Resume at Phase 6 (execute) |
| `implementation-in-progress` | Resume at Phase 6 (continue execute) |
| `implementation-approved` | Resume at Phase 9 (commit + push ask) |
| `pushed` \| `local-only` | Ask user: new suffix, abort, or replay for reference only |
| `aborted-*` \| `failed` | Offer new suffix or full restart |
4. When resuming, read Runtime State for `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via `codex exec resume`, `cursor-agent --resume`, or `opencode run -s <id>` as applicable.
---
## Tracker Discipline (MANDATORY)
**ALWAYS update `task-plan.md` before/after each phase transition. NEVER proceed with stale state.**
Before starting any phase:
1. Update `Status` if it transitions.
2. Update `last_phase_entered` in Runtime State.
After completing any phase:
1. Update `Status` if it transitions.
2. Append notes to the relevant section of `task-plan.md`.
Review History is append-only.
---
## Execution Workflow Rules
- Current branch is the default; worktree is opt-in only.
- Do NOT push without explicit "yes".
- Secret scan runs **per-payload, no caching** — every round, including revisions.
- Review loops use `MAX_ROUNDS=10` by default, shared across both loops.
- The task commit is a single commit created in Phase 9; interim WIP commits are NOT created.
- The `.gitignore` infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort.
---
## Verification Checklist
- [ ] `ai_plan/` exists and `/ai_plan/` is in `.gitignore`
- [ ] `task-plan.md` created under `ai_plan/YYYY-MM-DD-<slug>/`
- [ ] Reviewer CLI + model + `MAX_ROUNDS` configured (or `skip`)
- [ ] Secret scan ran on every outbound reviewer payload
- [ ] Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Phase 6 executed TDD-first for all behavior-changing steps (or documented skip)
- [ ] Phase 7 verification green before Phase 8
- [ ] Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Single task commit created locally, no push without explicit yes
- [ ] Telegram notification attempted if configured
- [ ] `task-plan.md` Final Status filled in
---
## Variant Hardening Notes — Cursor
- Must use workspace discovery from `.cursor/skills/` (repo-local) or `~/.cursor/skills/` (global).
- Reviewer invocation MUST use `--mode=ask --trust --output-format json`. Never `--mode=agent`, never `--force`, never write-capable modes for reviewer calls.
- `jq` is a hard prerequisite because the cursor reviewer branch always returns JSON that must be parsed for `session_id` and `result`.
- Helper paths are `.cursor/skills/reviewer-runtime/...` preferred, with `~/.cursor/skills/reviewer-runtime/...` as the fallback path.
- Sub-skills are invoked through workspace discovery with explicit announcement — Cursor does not expose a `Skill` tool.
- `cursor-agent --version` must succeed as part of the prerequisite check. The binary is `cursor-agent`; the alias `cursor agent` is equivalent.
- No plan-mode guard (Cursor Agent CLI has no plan-mode concept).
## Common Mistakes
- Forgetting `--trust` flag when running `cursor-agent` non-interactively (causes interactive prompt).
- Using `--mode=agent` or `--force` for reviews (reviewer must be read-only, use `--mode=ask`).
- Forgetting to install `jq` (cursor reviewer branch silently breaks).
- Skipping workspace-discovery skill verification in Prerequisite Check.
- Asking multiple clarifying questions in a single message.
- Skipping the per-payload secret scan because "the previous round was clean".
- Pushing the task commit without explicit user approval.
## Red Flags — Stop and Correct
- Reviewer CLI is running with `--mode=agent`, `--force`, or any write-capable flag.
- You did not announce which skill you invoked and why.
- You are proceeding to implementation review with failing lint/typecheck/tests.
- You are echoing raw secret-scan matches to the user or logs.
- You are pushing without explicit user approval.
@@ -0,0 +1,144 @@
# Task Plan: [Short Title]
> **Variant guardrail (Cursor):** Sub-skills (`brainstorming`, `test-driven-development`, `verification-before-completion`, `finishing-a-development-branch`, `using-git-worktrees`) MUST be invoked through workspace discovery from `.cursor/skills/superpowers/skills/<skill>/SKILL.md` or `~/.cursor/skills/superpowers/skills/<skill>/SKILL.md`. Reviewer invocations MUST use `--mode=ask --trust --output-format json`. `jq` is a hard prerequisite.
## Metadata
| Field | Value |
|-------|-------|
| Created | YYYY-MM-DD |
| Slug | YYYY-MM-DD-<slug> |
| Runtime | cursor |
| Reviewer CLI | codex \| claude \| cursor \| opencode |
| Reviewer Model | <model> |
| MAX_ROUNDS | 10 |
| Branch Strategy | current-branch \| worktree |
| Branch Name | <current branch name, or new branch name when worktree is used> |
| Worktree Path | <absolute path to worktree dir; blank when Branch Strategy = current-branch> |
| Status | draft |
### Status Enum (authoritative)
| Value | Meaning |
|-------|---------|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
---
## Prompt
<!-- Exact user prompt, verbatim. -->
## Interpretation
<!-- Short restatement of goal + out-of-scope items. -->
## Assumptions
<!-- Anything we're assuming and needs confirmation. Empty list OK after clarifying questions. -->
## Files
<!-- Files expected to be created / modified / deleted. Paths are absolute or repo-relative. -->
| Action | Path | Why |
|--------|------|-----|
| | | |
## Approach
<!-- 3-10 bullets describing implementation order. -->
## TDD Approach
<!-- One of:
(a) **TDD applies** — list the failing test(s) to write first, then implementation, then confirm green.
(b) **TDD auto-skipped** — reason must be exactly one of:
- `pure-documentation`
- `pure-comment-whitespace-rename`
(c) **TDD user-approved skip** — user explicitly approved skipping TDD for this task.
Record the approval timestamp (ISO-8601) and the specific reason (e.g., `pure-config-addition`).
-->
## Acceptance Criteria
- [ ] <criterion 1>
- [ ] <criterion 2>
## Verification
<!-- Commands to run:
lint: <cmd>
typecheck: <cmd>
tests: <cmd>
-->
## Rollback
<!-- How to undo: `git revert <hash>`, or manual steps if the change is not easily revertable. -->
---
## Runtime State
<!-- Updated by the skill at runtime. Used to detect resume and to persist reviewer session IDs across rounds. -->
```yaml
plan_review_round: 0
implementation_review_round: 0
CODEX_PLAN_SESSION_ID:
CODEX_IMPL_SESSION_ID:
CURSOR_PLAN_SESSION_ID:
CURSOR_IMPL_SESSION_ID:
OPENCODE_PLAN_SESSION_ID:
OPENCODE_IMPL_SESSION_ID:
last_phase_entered:
last_round_ts:
last_scan_outcome_plan:
last_scan_outcome_impl:
verification_attempts: 0
tests_added_count: 0
tdd_used: false
```
## Review History
<!-- Append one entry per reviewer round, both loops. -->
| Timestamp (ISO-8601) | Loop | Round | Verdict | Summary |
|----------------------|------|-------|---------|---------|
| | | | | |
## Final Status
<!-- Filled at the terminal outcome (phase 9/10). Populate at least:
- Terminal status (one of the 10 Status enum values)
- Commit hash (if any)
- Plan-review rounds used / MAX_ROUNDS
- Implementation-review rounds used / MAX_ROUNDS
- TDD used (true|false)
- Tests added count
- Verification attempts used
- Last round ISO-8601 timestamp
- Notes (anything the user should know when revisiting)
-->
---
## Guardrails (do NOT remove)
- This file is the single persistent artifact for `do-task`. Do not split it or delete it on success.
- `Status` must always match one of the 10 enum values.
- `Runtime State` is updated by the skill, not by the user.
- Review History is append-only.
- `last_scan_outcome_plan` and `last_scan_outcome_impl` record the most recent secret-scan result for each loop. They are informational; the scan itself runs per-payload with no caching.
+801
View File
@@ -0,0 +1,801 @@
---
name: do-task
description: Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review) in OpenCode. ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan).
---
# Do Task (OpenCode)
Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify.
This is a single-artifact sibling of `create-plan` + `implement-plan`. Unlike `implement-plan`, `do-task` operates on one persistent `task-plan.md` (not a full milestone plan) and defaults to the **current branch** (not a worktree).
**Core principle:** OpenCode loads skills through its native skill tool from `~/.config/opencode/skills/`. Sub-skill invocations use OpenCode's native mechanism — not Claude's `Skill` tool, not Codex's native-discovery patterns, not Cursor's workspace discovery.
## Prerequisite Check (MANDATORY)
Required:
- OpenCode CLI: `opencode --version` (install via your package manager or `brew install opencode`).
- Superpowers repo: `https://github.com/obra/superpowers`
- OpenCode Superpowers skills symlink: `~/.config/opencode/skills/superpowers`
- `superpowers/brainstorming`
- `superpowers/test-driven-development`
- `superpowers/verification-before-completion`
- `superpowers/finishing-a-development-branch`
- `superpowers/using-git-worktrees` (only when the prompt opts in to a worktree)
- Shared reviewer runtime: `~/.config/opencode/skills/reviewer-runtime/run-review.sh`
- Telegram notifier helper: `~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh`
Verify before proceeding:
```bash
opencode --version
ls -l ~/.config/opencode/skills/superpowers
test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md
```
If any required dependency is missing, stop immediately and return:
`Missing dependency: [specific missing item]. Install required OpenCode Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the reviewer-runtime helper, then retry.`
## Required Skill Invocation Rules
- Invoke relevant skills through OpenCode's native skill tool.
- Announce skill usage explicitly:
- `I've read the [Skill Name] skill and I'm using it to [purpose].`
- For skills with checklists, track checklist items explicitly in conversation.
- Do NOT use Claude's `Skill` tool syntax, Codex's `~/.agents/skills/` native-discovery paths, or Cursor's workspace discovery. OpenCode's skill system is independent.
## Trigger Phrase Detection
**Binding triggers** (always invoke this skill):
- `/do-task`
- "do this task"
- "do task ..."
- "execute this task"
- "make it so"
**Hint trigger** (invoke unless context clearly maps to another skill):
- "just do ..."
**Escape phrases** (skip the Phase 2 clarifying-question loop):
- `--no-questions`
- `"just do it:"`
- `"just do this:"`
- `"no questions:"`
**Excluded** (do NOT trigger `do-task`):
- "implement this" — reserved for `implement-plan`.
**Dropped defaults** (explicitly NOT binding triggers):
- "work on ..."
- "handle this"
- "take care of ..."
- "get this done"
**Worktree opt-in phrases** (Phase 4 takes the worktree branch):
- "in a worktree"
- "use a worktree"
- "on an isolated branch"
- "on a new branch called X"
## Process
### Phase 1: Preflight (includes Bootstrap Superpowers Context)
1. **Bootstrap Superpowers context** — use OpenCode's native skill tool to list installed skills and confirm `superpowers/brainstorming`, `superpowers/test-driven-development`, `superpowers/verification-before-completion`, and `superpowers/finishing-a-development-branch` are discoverable. If any is missing, stop with the Prerequisite Check error message.
2. Verify git repo: `git rev-parse --is-inside-work-tree`.
3. Verify `/ai_plan/` is present in `.gitignore`. If missing:
- Append `/ai_plan/` to `.gitignore`.
- Commit that infra change immediately with message `chore(gitignore): ignore ai_plan local planning artifacts`.
- This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up `aborted` or `failed`.
4. Announce each sub-skill before invocation using:
`I've read the [Skill Name] skill and I'm using it to [purpose].`
### Phase 2: Parse Prompt and Question
1. Capture the exact user prompt verbatim.
2. Detect trigger phrase (see above) and record which one matched.
3. Detect escape phrase. If set, skip clarifying questions entirely.
4. Apply the ask-first heuristic:
- Skip clarifying questions ONLY if ALL are true:
- Prompt names a concrete target (file, feature, or function).
- Prompt names a concrete outcome (what success looks like).
- Prompt has no ambiguous scope (no "and maybe also ...").
- All identifiers in the prompt are resolvable against the codebase.
- Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred.
- Empty prompt → ask exactly once: "what task?".
5. Invoke `superpowers/brainstorming` via OpenCode's native skill tool for any **behavior-changing** task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: `pure-documentation` and `pure-comment-whitespace-rename`. When skipping, record the skip reason in the Interpretation section of `task-plan.md`.
### Phase 3: Configure Reviewer
If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with `REVIEWER_CLI=codex`, `REVIEWER_MODEL=gpt-5.4`, and `MAX_ROUNDS=10`. Otherwise, ask:
1. **Which CLI should review both the plan and the implementation?**
- `codex` — OpenAI Codex CLI (`codex exec`)
- `claude` — Claude Code CLI (`claude -p`)
- `cursor` — Cursor Agent CLI (`cursor-agent -p`)
- `opencode` — OpenCode CLI (`opencode run`)
- `skip` — No external review, proceed with user approval only at each loop.
2. **Which model?** (only if a CLI was chosen)
- For `codex`: default `gpt-5.4`, alternatives: `gpt-5.3-codex`, `o4-mini`, `o3`.
- For `claude`: default `sonnet`, alternatives: `opus`, `haiku`.
- For `cursor`: **run `cursor-agent models` first** to see available models.
- For `opencode`: provider-qualified form `<provider>/<model>` (e.g., `anthropic/claude-sonnet-4-5`, `openai/gpt-5.4`). Run `opencode models` to list available models.
- Accept any model string the user provides.
3. **Max review rounds shared across both loops?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phases 5 and 8.
### Phase 4: Initialize Plan Workspace
OpenCode has no plan-mode concept; there is no plan-mode guard here.
Steps:
1. Compute slug: `YYYY-MM-DD-<slug>` where `<slug>` is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only).
2. Compute plan folder: `ai_plan/<slug>/`.
3. **Resume detection:** If the folder already exists, read `task-plan.md`:
- If `Status` is `draft` or `plan-approved` or `implementation-in-progress`: offer to resume, pick a new suffix (`<slug>-v2`), or abort. Default is resume.
- If `Status` is any terminal value (`pushed`, `local-only`, `aborted-*`, `failed`): offer a new suffix or abort. Default is new suffix.
4. If not resuming, create the folder and write `task-plan.md` from the template at `templates/task-plan.md` (this skill's template folder; falls back to `~/.config/opencode/skills/do-task/templates/task-plan.md` when installed directly).
5. Fill in:
- `Metadata` block.
- `Prompt` (verbatim).
- `Interpretation`, `Assumptions`, `Files`, `Approach`, `TDD Approach`, `Acceptance Criteria`, `Verification`, `Rollback`.
- Leave `Runtime State`, `Review History`, `Final Status` empty (skill updates these).
6. Set `Status: draft`.
**Worktree branch:** If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke `superpowers/using-git-worktrees` via OpenCode's native skill tool before proceeding. Otherwise continue on the current branch.
### Phase 5: Plan Review Loop
If `REVIEWER_CLI=skip`, present `task-plan.md` to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = plan
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
PAYLOAD_PATH = /tmp/do-task-plan-${REVIEW_ID}.md
PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID
```
Payload is the current `task-plan.md` **with the `Runtime State` and `Review History` blocks stripped** before writing to `PAYLOAD_PATH`. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections.
`PLAN_REVIEW_PROMPT`:
```text
Review this task plan for completeness, correctness, and risk. Focus on:
1. Does the plan match the user's prompt?
2. Are all assumptions surfaced?
3. Are acceptance criteria testable?
4. Is the TDD approach appropriate per the TDD Approach section?
5. Are there missing files, risks, or security concerns?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: plan-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 6.
On MAX_ROUNDS:
- Set `Status: aborted-plan-review`.
- Send Telegram summary before stopping.
- Ask the user whether to override and proceed, restart, or abort.
### Phase 6: Execute (TDD-first where applicable)
Native orchestration — do not invoke `superpowers:executing-plans`.
1. Set `Status: implementation-in-progress`.
2. For every behavior-changing file edit:
- Invoke `superpowers/test-driven-development` via OpenCode's native skill tool.
- Write the failing test first. Run it. Confirm it fails.
- Implement the minimal code to make it pass. Run the test. Confirm green.
- Do NOT commit yet — a single task commit happens in Phase 9.
3. Auto-skip of TDD is permitted ONLY for tasks classified in `task-plan.md` TDD Approach as:
- `pure-documentation`
- `pure-comment-whitespace-rename`
4. Any other skip (including `pure-config-addition`) requires explicit user approval recorded in `task-plan.md` with an ISO-8601 timestamp.
5. Update `task-plan.md` after each logical step: add notes to `Approach`, check off `Acceptance Criteria` items as they complete.
### Phase 7: Verification Gate
Invoke `superpowers/verification-before-completion` via OpenCode's native skill tool.
Run the commands listed in the `Verification` section of `task-plan.md`:
- Lint (changed files first).
- Typecheck.
- Tests (targeted first, then broader suite if quick).
All must pass. If a command fails:
- Fix the issue.
- Re-run that command.
- Increment `verification_attempts` in Runtime State.
If `verification_attempts` exceeds 3 without green:
- Set `Status: aborted-verification`.
- Send Telegram summary.
- Ask the user whether to retry, override, or abort.
### Phase 8: Implementation Review Loop
If `REVIEWER_CLI=skip`, present a diff + verification summary to the user and proceed only after explicit user approval.
Otherwise, invoke the Review Loop (Shared Subroutine) with:
```
REVIEW_KIND = implementation
REVIEW_ID = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) # distinct from plan-review ID
PAYLOAD_PATH = /tmp/do-task-implementation-${REVIEW_ID}.md
PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT (see below)
SESSION_ID_VAR = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID
```
Payload contents (assembled by the skill):
```markdown
# Implementation Review: [Short Title]
## Task Plan (the plan that was approved)
<embed approved task-plan.md, excluding Runtime State block>
## Changes Made (git diff)
<output of: `git diff` for unstaged + `git diff --staged` for staged>
## Verification Output
### Lint
<lint output>
### Typecheck
<typecheck output>
### Tests
<test output, pass/fail counts>
```
`IMPL_REVIEW_PROMPT`:
```text
Review this implementation against the task plan. Focus on:
1. Correctness — Does the diff satisfy the Acceptance Criteria?
2. Code quality — Clean, maintainable, no obvious issues?
3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)?
4. Security — Any security concerns introduced?
5. Regressions — Does the diff risk breaking unrelated code?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.
```
On APPROVED:
- Set `Status: implementation-approved`.
- Append APPROVED row to Review History.
- Proceed to Phase 9.
On MAX_ROUNDS:
- Set `Status: aborted-impl-review`.
- Send Telegram summary.
- Ask the user whether to override and commit anyway, restart, or abort.
### Phase 9: Commit + Push Ask
Invoke `superpowers/finishing-a-development-branch` via OpenCode's native skill tool.
1. Stage all changed files explicitly (avoid `git add -A`).
2. Single commit with message derived from the task goal:
- Format: `<type>(<scope>): <short description>`
- Example: `feat(auth): add session token rotation`
3. Do NOT push. Update `Status: local-only`.
4. Ask the user: "Push to remote? (yes / no)"
- On explicit `yes` → push, then set `Status: pushed`.
- Any other response → leave `Status: local-only`.
### Phase 10: Telegram Notification + Finalize
Resolve the notifier helper:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome (`pushed`, `local-only`, `aborted-*`, `failed`), send a Telegram summary if both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are set:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "do-task <slug>: <status summary>"
fi
```
Rules:
- Telegram is the only supported notification path.
- Notification failures are non-blocking but must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
- If Telegram is not configured, state that no Telegram notification was sent.
Fill in `Final Status` in `task-plan.md` (include commit hash if any). Do NOT delete the plan folder — it stays as a record.
---
## Review Loop (Shared Subroutine)
This subroutine is invoked twice per `do-task` run: once in Phase 5 (`REVIEW_KIND=plan`) and once in Phase 8 (`REVIEW_KIND=implementation`). Separate session IDs are used for each loop so reviewer context never leaks across loops.
### Subroutine Inputs
| Variable | Purpose |
|----------|---------|
| `REVIEW_KIND` | `plan` or `implementation` |
| `REVIEW_ID` | 8-char hex (from `uuidgen`); reused across rounds of the same loop |
| `PAYLOAD_PATH` | `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` |
| `PROMPT_TEMPLATE` | `PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT` |
| `REVIEWER_CLI` | `codex` \| `claude` \| `cursor` \| `opencode` |
| `REVIEWER_MODEL` | Model name |
| `MAX_ROUNDS` | Default 10 |
| `SESSION_ID_VAR` | `CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID` |
Temp artifact paths (per loop):
- `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` — payload
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md` — normalized review text
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json` — raw Cursor/OpenCode JSON (cursor only, plus opencode when `--format json` is used)
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out`
- `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh`
Resolve the shared helper:
```bash
REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh
```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
;;
esac
```
### Step 1: Write Payload
Write the full payload for this round to `PAYLOAD_PATH`.
### Step 1a: Secret Scan (per-payload, no caching)
**BEFORE** sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw.
Run the secret scan with all of these anchored regexes. Use `grep -En` on the payload file:
```bash
SECRET_REGEX_FILE=$(mktemp)
cat >"$SECRET_REGEX_FILE" <<'EOF'
AKIA[0-9A-Z]{16}
"type"\s*:\s*"service_account"
(ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
sk-(proj-)?[A-Za-z0-9_-]{20,}
sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
-----BEGIN [A-Z ]+ PRIVATE KEY-----
(TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
EOF
SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true)
rm -f "$SECRET_REGEX_FILE"
```
If `SCAN_MATCHES` is non-empty:
1. **Redact the matched text before surfacing** — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: `[REDACTED:<pattern-label>:<match-length>-chars]`. Example: a matched AWS key becomes `[REDACTED:aws-access-key:20-chars]`. Keep the file path and line number; they are useful for the user and not secret.
2. Present the redacted match summary to the user using this exact wording:
```
SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N):
<file>:<line>: [REDACTED:<pattern-label>:<match-length>-chars]
...
Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact)
```
Pattern labels: `aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`, `anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`.
2. Wait for user response.
3. On `yes`: record `last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches` in Runtime State, and proceed.
4. On `redact`: ask the user to supply redactions, apply them to `PAYLOAD_PATH`, re-scan (this step), record `last_scan_outcome_${REVIEW_KIND}=redacted-and-approved`.
5. On `no`: stop the loop, set `Status: failed`, send Telegram, return to the user.
If `SCAN_MATCHES` is empty, record `last_scan_outcome_${REVIEW_KIND}=clean` and proceed.
### Step 2: Generate Reviewer Command Script
Write the reviewer invocation to `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh` as a bash script starting with:
```bash
#!/usr/bin/env bash
set -euo pipefail
```
**If `REVIEWER_CLI` is `codex`:**
Round 1 — fresh `codex exec`:
```bash
codex exec \
-m ${REVIEWER_MODEL} \
-s read-only \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
${PROMPT_TEMPLATE}"
```
Do not capture the Codex session ID yet. After Round 1 completes, extract it from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` (look for `session id: <uuid>`) and persist it to Runtime State under `${SESSION_ID_VAR}`.
Round 2 and later — resume session:
```bash
codex exec resume ${SESSION_ID_VAR_VALUE} \
-o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before.
Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain."
```
If resume fails, fall back to fresh `codex exec` with prior-round context.
**If `REVIEWER_CLI` is `claude`:**
Fresh call every round (Claude CLI has no session resume):
```bash
claude -p \
"${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
${PROMPT_TEMPLATE}" \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
```
Where `${ROUND_PREFIX}` is empty for Round 1 and `"You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. "` for subsequent rounds.
**If `REVIEWER_CLI` is `cursor`:**
Round 1:
```bash
cursor-agent -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later — resume:
```bash
cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
If resume fails, fall back to fresh `cursor-agent -p`.
After the command completes, extract the session id and review text:
```bash
CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
Persist `CURSOR_SID` to Runtime State under `${SESSION_ID_VAR}` on Round 1.
**If `REVIEWER_CLI` is `opencode`:**
OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in `plan` primary agent (`--agent plan`) for review, which is read-oriented and does not modify files. Session resume is supported via `-s <session-id>`, but the most reliable pattern for non-interactive review is **fresh call each round** (like `claude`) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in `--format json` output.
Round 1 (preferred, fresh call):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.
${PROMPT_TEMPLATE}" \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Round 2 and later (fresh-call fallback path — recommended default):
```bash
opencode run \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"You previously reviewed this ${REVIEW_KIND} and requested revisions.
Previous feedback summary: [key points from last review]
I've revised. Updated payload is below.
$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Optional session-resume path (only if the installed opencode reliably emits a session id in `--format json` output and accepts it back via `-s`):
```bash
# Round 2+ with resume
opencode run \
-s ${SESSION_ID_VAR_VALUE} \
-m ${REVIEWER_MODEL} \
--agent plan \
--format json \
"I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json
```
Extract the review body (the JSON stream emits events; the final assistant message contains the review text):
```bash
jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
> /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
|| cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as `completed-empty-output` and follow the helper-failure escalation in Step 6.
### Step 3: Run via `run-review.sh`
Run the command script through the shared helper when available:
```bash
if [ -x "$REVIEWER_RUNTIME" ]; then
"$REVIEWER_RUNTIME" \
--command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
--stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
--status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
fi
```
Run the helper in the foreground and watch live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the `.status` file instead of treating heartbeats as post-hoc-only data.
### Step 4: Promote Reviewer Output + Capture Session ID
After the command completes:
- `cursor`: already promoted in Step 2 via `jq -r '.result' ...`. Also capture `session_id` if first round.
- `codex`: extract `CODEX_SESSION_ID` from `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text lives only in `.runner.out`, `cp` it into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `claude`: promote `.runner.out` into the `.md` file:
```bash
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```
- `opencode`: already promoted in Step 2 via `jq` on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to `${SESSION_ID_VAR}`.
On Round 1, persist the captured session ID (if any) into `task-plan.md`'s Runtime State under `${SESSION_ID_VAR}`.
### Step 5: Parse Verdict + Update Review History
1. Read `/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md`.
2. Append one row to `task-plan.md` Review History:
- Timestamp (ISO-8601 UTC).
- Loop (`plan` or `implementation`).
- Round number.
- Verdict (`APPROVED` or `REVISE`).
- Summary (first line of the `## Summary` section).
3. Increment `plan_review_round` or `implementation_review_round` in Runtime State.
### Step 6: Branch APPROVED / REVISE / MAX_ROUNDS
Verdict rules:
- **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → exit the subroutine with `APPROVED`.
- **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if cheap and safe, then exit with `APPROVED`.
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to revision (see below), then return to Step 1 for the next round.
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as APPROVED.
- Helper state `completed-empty-output` → treat as failed review attempt, surface `.stderr`/`.status`, fix invocation or prompt handling, then retry.
- Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters.
- Round counter ≥ `MAX_ROUNDS` → exit the subroutine with `MAX_ROUNDS`. Caller decides next action per Phase 5 or Phase 8.
**Revision:** The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (`P0` → `P1` → `P2` → `P3`). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1.
### Step 7: Liveness Contract (during Step 3)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- Keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
### Step 8: Cleanup (on successful round exit)
```bash
rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh
```
If the round failed, produced empty output, or reached operator-decision timeout, KEEP `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them.
---
## Resume Semantics
1. Detect existing plan folder by slug at Phase 4.
2. Read `task-plan.md` → `Status`.
3. Decide next action:
| Status | Action |
|--------|--------|
| `draft` | Resume at Phase 5 (plan review) |
| `plan-approved` | Resume at Phase 6 (execute) |
| `implementation-in-progress` | Resume at Phase 6 (continue execute) |
| `implementation-approved` | Resume at Phase 9 (commit + push ask) |
| `pushed` \| `local-only` | Ask user: new suffix, abort, or replay for reference only |
| `aborted-*` \| `failed` | Offer new suffix or full restart |
4. When resuming, read Runtime State for `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`, `OPENCODE_IMPL_SESSION_ID`, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via `codex exec resume`, `cursor-agent --resume`, or `opencode run -s <id>` as applicable.
---
## Tracker Discipline (MANDATORY)
**ALWAYS update `task-plan.md` before/after each phase transition. NEVER proceed with stale state.**
Before starting any phase:
1. Update `Status` if it transitions.
2. Update `last_phase_entered` in Runtime State.
After completing any phase:
1. Update `Status` if it transitions.
2. Append notes to the relevant section of `task-plan.md`.
Review History is append-only.
---
## Execution Workflow Rules
- Current branch is the default; worktree is opt-in only.
- Do NOT push without explicit "yes".
- Secret scan runs **per-payload, no caching** — every round, including revisions.
- Review loops use `MAX_ROUNDS=10` by default, shared across both loops.
- The task commit is a single commit created in Phase 9; interim WIP commits are NOT created.
- The `.gitignore` infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort.
---
## Verification Checklist
- [ ] `ai_plan/` exists and `/ai_plan/` is in `.gitignore`
- [ ] `task-plan.md` created under `ai_plan/YYYY-MM-DD-<slug>/`
- [ ] Reviewer CLI + model + `MAX_ROUNDS` configured (or `skip`)
- [ ] Secret scan ran on every outbound reviewer payload
- [ ] Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Phase 6 executed TDD-first for all behavior-changing steps (or documented skip)
- [ ] Phase 7 verification green before Phase 8
- [ ] Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped)
- [ ] Single task commit created locally, no push without explicit yes
- [ ] Telegram notification attempted if configured
- [ ] `task-plan.md` Final Status filled in
---
## Variant Hardening Notes — OpenCode
- Must use OpenCode's native skill tool for sub-skill invocation. Do NOT use Claude's `Skill` tool syntax or Codex's `~/.agents/skills/` paths.
- Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms `superpowers/brainstorming`, `superpowers/test-driven-development`, `superpowers/verification-before-completion`, and `superpowers/finishing-a-development-branch` are discoverable before any other phase runs.
- Helper paths are `~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`.
- OpenCode reviewer CLI branch (when `REVIEWER_CLI=opencode`):
- Binary: `opencode`. Non-interactive: `opencode run "<message>"`.
- Model: `-m <provider>/<model>` (e.g., `openai/gpt-5.4`, `anthropic/claude-sonnet-4-5`).
- Read-only posture: `--agent plan` (uses OpenCode's built-in plan primary agent; no explicit `--read-only` flag exists).
- Output: `--format json` for structured output. Extraction uses `jq` against the JSON event stream.
- Session resume: `-s <session-id>` or `--continue`. Fresh call each round is the recommended default since session id capture is less standardized than codex/cursor for headless runs.
- No plan-mode guard (OpenCode has no plan-mode concept).
## Common Mistakes
- Skipping the Bootstrap Superpowers Context step in Phase 1 (breaks native skill discovery).
- Using Claude `Skill` tool syntax or Codex `~/.agents/skills/` paths.
- Forgetting to set `--agent plan` on opencode reviewer calls (would use the default `build` agent which can write files).
- Asking multiple clarifying questions in a single message.
- Skipping the per-payload secret scan because "the previous round was clean".
- Pushing the task commit without explicit user approval.
- Using a non-provider-qualified model string for opencode (e.g., `gpt-5.4` instead of `openai/gpt-5.4`).
## Red Flags — Stop and Correct
- You are invoking sub-skills via Claude's `Skill` tool or Codex native-discovery paths instead of OpenCode's native skill tool.
- You are running an opencode reviewer call without `--agent plan`.
- You did not announce which skill you invoked and why.
- You are proceeding to implementation review with failing lint/typecheck/tests.
- You are echoing raw secret-scan matches to the user or logs.
- You are pushing without explicit user approval.
@@ -0,0 +1,144 @@
# Task Plan: [Short Title]
> **Variant guardrail (OpenCode):** Sub-skills (`brainstorming`, `test-driven-development`, `verification-before-completion`, `finishing-a-development-branch`, `using-git-worktrees`) MUST be invoked through OpenCode's native skill tool from `~/.config/opencode/skills/superpowers/<skill>/SKILL.md`. Phase 1 MUST include the Bootstrap Superpowers Context step. Opencode reviewer calls MUST use `--agent plan` to stay read-only.
## Metadata
| Field | Value |
|-------|-------|
| Created | YYYY-MM-DD |
| Slug | YYYY-MM-DD-<slug> |
| Runtime | opencode |
| Reviewer CLI | codex \| claude \| cursor \| opencode |
| Reviewer Model | <model> |
| MAX_ROUNDS | 10 |
| Branch Strategy | current-branch \| worktree |
| Branch Name | <current branch name, or new branch name when worktree is used> |
| Worktree Path | <absolute path to worktree dir; blank when Branch Strategy = current-branch> |
| Status | draft |
### Status Enum (authoritative)
| Value | Meaning |
|-------|---------|
| `draft` | Newly created; plan review not yet started |
| `plan-approved` | Plan review loop returned APPROVED |
| `implementation-in-progress` | Phase 6 executing |
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
| `pushed` | Committed + pushed to remote |
| `local-only` | Committed locally; user declined push |
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
| `failed` | Hard tooling failure |
---
## Prompt
<!-- Exact user prompt, verbatim. -->
## Interpretation
<!-- Short restatement of goal + out-of-scope items. -->
## Assumptions
<!-- Anything we're assuming and needs confirmation. Empty list OK after clarifying questions. -->
## Files
<!-- Files expected to be created / modified / deleted. Paths are absolute or repo-relative. -->
| Action | Path | Why |
|--------|------|-----|
| | | |
## Approach
<!-- 3-10 bullets describing implementation order. -->
## TDD Approach
<!-- One of:
(a) **TDD applies** — list the failing test(s) to write first, then implementation, then confirm green.
(b) **TDD auto-skipped** — reason must be exactly one of:
- `pure-documentation`
- `pure-comment-whitespace-rename`
(c) **TDD user-approved skip** — user explicitly approved skipping TDD for this task.
Record the approval timestamp (ISO-8601) and the specific reason (e.g., `pure-config-addition`).
-->
## Acceptance Criteria
- [ ] <criterion 1>
- [ ] <criterion 2>
## Verification
<!-- Commands to run:
lint: <cmd>
typecheck: <cmd>
tests: <cmd>
-->
## Rollback
<!-- How to undo: `git revert <hash>`, or manual steps if the change is not easily revertable. -->
---
## Runtime State
<!-- Updated by the skill at runtime. Used to detect resume and to persist reviewer session IDs across rounds. -->
```yaml
plan_review_round: 0
implementation_review_round: 0
CODEX_PLAN_SESSION_ID:
CODEX_IMPL_SESSION_ID:
CURSOR_PLAN_SESSION_ID:
CURSOR_IMPL_SESSION_ID:
OPENCODE_PLAN_SESSION_ID:
OPENCODE_IMPL_SESSION_ID:
last_phase_entered:
last_round_ts:
last_scan_outcome_plan:
last_scan_outcome_impl:
verification_attempts: 0
tests_added_count: 0
tdd_used: false
```
## Review History
<!-- Append one entry per reviewer round, both loops. -->
| Timestamp (ISO-8601) | Loop | Round | Verdict | Summary |
|----------------------|------|-------|---------|---------|
| | | | | |
## Final Status
<!-- Filled at the terminal outcome (phase 9/10). Populate at least:
- Terminal status (one of the 10 Status enum values)
- Commit hash (if any)
- Plan-review rounds used / MAX_ROUNDS
- Implementation-review rounds used / MAX_ROUNDS
- TDD used (true|false)
- Tests added count
- Verification attempts used
- Last round ISO-8601 timestamp
- Notes (anything the user should know when revisiting)
-->
---
## Guardrails (do NOT remove)
- This file is the single persistent artifact for `do-task`. Do not split it or delete it on success.
- `Status` must always match one of the 10 enum values.
- `Runtime State` is updated by the skill, not by the user.
- Review History is append-only.
- `last_scan_outcome_plan` and `last_scan_outcome_impl` record the most recent secret-scan result for each loop. They are informational; the scan itself runs per-payload with no caching.