Files
stefano 251148c3ff
check / check (ubuntu-latest) (push) Successful in 2m5s
check / check (macos-latest) (push) Has been cancelled
check-online / check-online (ubuntu-latest) (push) Successful in 1m53s
Perform code optimization and document cleanup (#1)
## Summary
- add repository-wide quality tooling and verification scaffolding, including CI workflows, pnpm workspace setup, ESLint/Prettier/markdown checks, and generated-output verification helpers
- reorganize skill sources and generation flow by introducing canonical `_source` variants, generator/manifests, reusable helper abstractions, and shared web-automation/browser utilities
- clean up and expand documentation so the root README flows into docs and skill docs, with clearer development, reviewer, installer, and workflow guidance

## Notable changes
- docs flow and consistency cleanup across `README.md`, `docs/README.md`, and related docs
- new scripts for `check`, docs verification, generated-file verification, shell portability, and safe directory replacement
- refactors in Atlassian and web-automation skill runtimes to reduce duplication and centralize reusable code
- changelog, development documentation, and CI surface updates

## Test Plan
- [ ] `pnpm run check`
- [ ] review generated/manifests and skill sync outputs
- [ ] smoke-check docs flow from `README.md` to `docs/README.md` to skill docs

## Notes
- this branch currently includes tracked `skills/web-automation/shared/node_modules` content that should be reviewed carefully as potentially noisy/accidental committed artifacts

Co-authored-by: Stefano Fiorini <stefano.fiorini@firsthorizon.com>
Reviewed-on: #1
2026-05-04 04:41:34 +00:00

29 KiB

DO-TASK

Purpose

Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review), with TDD-first execution, a pre-implementation verification gate, and a single task commit — all in one run of the skill. do-task is scoped to small-to-medium ad-hoc tasks; for multi-milestone work use create-plan + implement-plan instead.

do-task persists one plan artifact per run: ai_plan/YYYY-MM-DD-<slug>/task-plan.md. The folder is kept as a record after success (not deleted). Resume is supported via the Status enum and Runtime State fields.

Requirements

  • Git repo with /ai_plan/ entry in .gitignore (the skill adds the entry automatically if missing and commits it as a separate infra commit).
  • Superpowers skills installed from: https://github.com/obra/superpowers
  • Required dependencies (vary by variant; see Install below):
    • superpowers:brainstorming (or superpowers/brainstorming for OpenCode)
    • superpowers:test-driven-development
    • superpowers:verification-before-completion
    • superpowers:finishing-a-development-branch
    • superpowers:using-git-worktrees (only when the prompt opts in to a worktree)
  • For Codex, native skill discovery must be configured:
    • ~/.agents/skills/superpowers -> ~/.codex/superpowers/skills
  • Cursor can use the Cursor Superpowers plugin cache or manual .cursor/skills/superpowers/skills / ~/.cursor/skills/superpowers/skills installs, and jq is a hard prerequisite for the Cursor variant.
  • OpenCode can use ~/.agents/skills/superpowers or ~/.config/opencode/skills/superpowers.
  • Shared reviewer runtime (run-review.sh) AND Telegram notifier helper (notify-telegram.sh) must be installed beside agent skills. Both scripts ship under skills/reviewer-runtime/ in this repo and must be copied into the per-variant location:
    • Codex: ~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}
    • Claude Code: ~/.claude/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}
    • OpenCode: ~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}
    • Cursor: .cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh} (repo-local, preferred) or ~/.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh} (global fallback)
    • Pi: .pi/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh} (repo-local) or ~/.pi/agent/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh} (global)
  • Variant-specific prerequisites:
    • Claude Code: claude --version, explicit Skill-tool invocation of sub-skills.
    • Codex: codex --version; ~/.agents/skills/superpowers -> ~/.codex/superpowers/skills symlink present.
    • Cursor: cursor-agent --version, jq --version (hard prereq), Superpowers available from the Cursor plugin cache or manual Cursor skill roots.
    • OpenCode: opencode --version; Superpowers available from ~/.agents/skills/superpowers or ~/.config/opencode/skills/superpowers; Phase 1 runs Bootstrap Superpowers Context.
  • Telegram notification setup is documented in TELEGRAM-NOTIFICATIONS.md

Dependency-missing messages are variant-specific:

  • Claude Code: Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.
  • Codex: Missing dependency: [specific missing item]. Install required Superpowers skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.
  • Cursor: Missing dependency: [specific missing item]. Install Cursor Agent CLI, jq, and the Cursor Superpowers plugin or Superpowers skills under .cursor/skills/ or ~/.cursor/skills/, then retry.
  • OpenCode: Missing dependency: [specific missing item]. Install required OpenCode Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the reviewer-runtime helper, then retry.
  • Pi: Missing dependency: [specific missing item]. Install Pi, required Superpowers skills, and the Pi reviewer-runtime helper, then retry.

Reviewer CLI Requirements

The canonical reviewer CLI support matrix is documented in REVIEWERS.md. One of these CLIs must be installed to drive either of the two review loops:

Reviewer CLI Install Verify Read-Only Mode Session Resume
codex npm install -g @openai/codex codex --version -s read-only Yes (codex exec resume <id>)
claude npm install -g @anthropic-ai/claude-code claude --version --strict-mcp-config --setting-sources user No (fresh call each round)
cursor curl https://cursor.com/install -fsS | bash cursor-agent --version (binary: cursor-agent; alias cursor agent also works) --mode=ask Yes (--resume <id>)
opencode brew install opencode or your package manager opencode --version --agent plan Opt-in (-s <id>; fresh call is the default)
pi Install Pi coding agent pi --version; list models with pi --list-models [search] --tools read,grep,find,ls No (fresh call each round)

The reviewer CLI is independent of which agent is running the skill — e.g., Claude Code can send both the plan and the implementation to Codex for review.

Additional dependency for cursor reviewer: jq is required to parse Cursor's JSON output. Install via brew install jq (macOS) or your package manager. Verify: jq --version. The cursor variant of do-task makes jq a hard prerequisite regardless of which reviewer CLI is selected.

Install

Codex

mkdir -p ~/.codex/skills/do-task
cp -R skills/do-task/codex/* ~/.codex/skills/do-task/
mkdir -p ~/.codex/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.codex/skills/reviewer-runtime/
chmod +x ~/.codex/skills/reviewer-runtime/*.sh

Claude Code

mkdir -p ~/.claude/skills/do-task
cp -R skills/do-task/claude-code/* ~/.claude/skills/do-task/
mkdir -p ~/.claude/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.claude/skills/reviewer-runtime/
chmod +x ~/.claude/skills/reviewer-runtime/*.sh

OpenCode

mkdir -p ~/.config/opencode/skills/do-task
cp -R skills/do-task/opencode/* ~/.config/opencode/skills/do-task/
mkdir -p ~/.config/opencode/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.config/opencode/skills/reviewer-runtime/
chmod +x ~/.config/opencode/skills/reviewer-runtime/*.sh

Cursor

Copy into the repo-local .cursor/skills/ directory (where the Cursor Agent CLI discovers skills):

mkdir -p .cursor/skills/do-task
cp -R skills/do-task/cursor/* .cursor/skills/do-task/
mkdir -p .cursor/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh .cursor/skills/reviewer-runtime/
chmod +x .cursor/skills/reviewer-runtime/*.sh

Or install globally (loaded via ~/.cursor/skills/):

mkdir -p ~/.cursor/skills/do-task
cp -R skills/do-task/cursor/* ~/.cursor/skills/do-task/
mkdir -p ~/.cursor/skills/reviewer-runtime
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.cursor/skills/reviewer-runtime/
chmod +x ~/.cursor/skills/reviewer-runtime/*.sh

Pi

Recommended full Pi package install:

./scripts/install-pi-package.sh --global
# or, for project-local Pi package install
./scripts/install-pi-package.sh --local

Manual single-skill Pi install from the package mirror:

pnpm run sync:pi
mkdir -p .pi/skills/do-task
cp -R pi-package/skills/do-task/* .pi/skills/do-task/
mkdir -p .pi/skills/reviewer-runtime/pi
cp -R skills/reviewer-runtime/pi/* .pi/skills/reviewer-runtime/pi/
chmod +x .pi/skills/reviewer-runtime/pi/*.sh

Global manual installs use ~/.pi/agent/skills/do-task/ and ~/.pi/agent/skills/reviewer-runtime/pi/ instead of .pi/skills/....

Pi workflow skills also require Superpowers. See PI-SUPERPOWERS.md and PI-COMMON-REVIEWER.md.

Verify Installation

Run the per-variant checks for everything the corresponding SKILL.md enforces. Each check is structured: (1) CLI binary version, (2) skill file presence, (3) reviewer-runtime + notifier helper presence, (4) Superpowers sub-skill discovery, (5) variant-specific extras.

Codex Verify

codex --version
test -f ~/.codex/skills/do-task/SKILL.md
test -x ~/.codex/skills/reviewer-runtime/run-review.sh
test -x ~/.codex/skills/reviewer-runtime/notify-telegram.sh
test -L ~/.agents/skills/superpowers
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md

Claude Code Verify

claude --version
test -f ~/.claude/skills/do-task/SKILL.md
test -x ~/.claude/skills/reviewer-runtime/run-review.sh
test -x ~/.claude/skills/reviewer-runtime/notify-telegram.sh
test -f ~/.claude/skills/superpowers/brainstorming/SKILL.md
test -f ~/.claude/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.claude/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.claude/skills/superpowers/finishing-a-development-branch/SKILL.md

OpenCode Verify

opencode --version
test -f ~/.config/opencode/skills/do-task/SKILL.md
test -x ~/.config/opencode/skills/reviewer-runtime/run-review.sh
test -x ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md || test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md

Cursor Verify

cursor-agent --version
jq --version
test -f .cursor/skills/do-task/SKILL.md || test -f ~/.cursor/skills/do-task/SKILL.md
test -x .cursor/skills/reviewer-runtime/run-review.sh || test -x ~/.cursor/skills/reviewer-runtime/run-review.sh
test -x .cursor/skills/reviewer-runtime/notify-telegram.sh || test -x ~/.cursor/skills/reviewer-runtime/notify-telegram.sh
test -f .cursor/skills/superpowers/skills/brainstorming/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/brainstorming/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/brainstorming/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/test-driven-development/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/test-driven-development/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/test-driven-development/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/verification-before-completion/SKILL.md' -print -quit 2>/dev/null | grep -q .
test -f .cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/finishing-a-development-branch/SKILL.md' -print -quit 2>/dev/null | grep -q .

Pi Verify

pi --version
test -f .pi/skills/do-task/SKILL.md || test -f ~/.pi/agent/skills/do-task/SKILL.md
test -x .pi/skills/reviewer-runtime/pi/run-review.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/run-review.sh
test -x .pi/skills/reviewer-runtime/pi/notify-telegram.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/notify-telegram.sh
test -f .pi/skills/superpowers/brainstorming/SKILL.md || test -f ~/.pi/agent/skills/superpowers/brainstorming/SKILL.md || test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
test -f .pi/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.pi/agent/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
test -f .pi/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.pi/agent/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f .pi/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.pi/agent/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md

Key Behavior

  • Creates one persistent plan artifact at ai_plan/YYYY-MM-DD-<slug>/task-plan.md.
  • Ensures /ai_plan/ is in .gitignore. If missing, adds it and creates a separate chore(gitignore): ignore ai_plan local planning artifacts commit.
  • Parses the user prompt, detects the trigger phrase, and asks 1-3 clarifying questions unless the prompt already has a concrete target + outcome + unambiguous scope + resolvable identifiers.
  • Invokes superpowers:brainstorming for any behavior-changing task (feature creation, non-trivial bug fix, refactor, design decision). The only skip conditions are pure-documentation and pure-comment-whitespace-rename.
  • Asks which reviewer CLI, model, and max rounds to use (or accepts skip for no review). "Use defaults" maps to codex / gpt-5.4 / MAX_ROUNDS=10.
  • Runs the plan review loop (Phase 5) before implementation, iterating up to MAX_ROUNDS (default 10) or until the reviewer returns VERDICT: APPROVED.
  • Executes with TDD-first (Phase 6) via superpowers:test-driven-development. Auto-skip permitted only for pure-documentation and pure-comment-whitespace-rename; all other skips (including config-file additions) require explicit user approval, recorded in the TDD Approach section with an ISO-8601 timestamp.
  • Runs lint/typecheck/tests as a verification gate (Phase 7) before the implementation review loop.
  • Runs the implementation review loop (Phase 8) against the diff + verification output, iterating up to MAX_ROUNDS or until APPROVED.
  • Scans every outbound reviewer payload for secrets (subroutine step 1a). Per-payload, no caching.
  • Creates a single commit after the implementation review approves. Does NOT push. Asks the user for explicit yes before any push.
  • Defaults to the current branch. Worktree only on explicit opt-in ("in a worktree", "use a worktree", "on an isolated branch", "on a new branch called X").
  • Supports resume: detects existing folder by slug and uses Status + Runtime State to decide how to re-enter.
  • Sends completion notifications through Telegram only when the shared setup in TELEGRAM-NOTIFICATIONS.md is installed and configured.

Dual Review Loops

do-task runs the reviewer twice per successful run, with separate session IDs so reviewer context never leaks across loops.

  1. Plan review loop (Phase 5) — payload is the current task-plan.md with Runtime State and Review History stripped. The reviewer evaluates whether the plan matches the prompt, whether assumptions are surfaced, whether acceptance criteria are testable, whether the TDD approach is appropriate, and whether there are missing files/risks/security concerns.
  2. Implementation review loop (Phase 8) — payload is the approved task plan (without Runtime State) + git diff (unstaged + staged) + verification output (lint, typecheck, tests). The reviewer evaluates correctness, code quality, test coverage, security, and regression risk.

Both loops share the same 9-step subroutine and the same MAX_ROUNDS counter (default 10).

Subroutine Steps (inside each review loop)

  1. Write payload to /tmp/do-task-<kind>-<REVIEW_ID>.md.
  2. Secret scan (step 1a) — per-payload, no caching. See Secret Scan section below.
  3. Generate reviewer command script at /tmp/do-task-<kind>-review-<REVIEW_ID>.sh.
  4. Run via reviewer-runtime/run-review.sh.
  5. Promote reviewer output and capture the session ID on Round 1; persist it to task-plan.md Runtime State under the loop-specific variable (CODEX_PLAN_SESSION_ID, CODEX_IMPL_SESSION_ID, CURSOR_PLAN_SESSION_ID, CURSOR_IMPL_SESSION_ID, OPENCODE_PLAN_SESSION_ID, or OPENCODE_IMPL_SESSION_ID).
  6. Parse verdict; append an entry to Review History; bump the round counter.
  7. Branch: APPROVED → exit, REVISE → caller revises and re-enters, MAX_ROUNDS → caller decides.
  8. Liveness contract: wait while In progress N heartbeats arrive from the runner.
  9. Cleanup temp artifacts on success.

Reviewer Output Contract

  • P0 = total blocker
  • P1 = major risk
  • P2 = must-fix before approval
  • P3 = cosmetic / nice to have
  • Each severity section uses - None. when empty.
  • VERDICT: APPROVED is valid only when no P0, P1, or P2 findings remain.
  • P3 findings are non-blocking, but the caller should still try to fix them when cheap and safe.

Runtime Artifacts

Per review loop (<kind> = plan or implementation):

  • /tmp/do-task-<kind>-<REVIEW_ID>.md — payload
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.md — normalized review text
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.json — raw JSON (cursor always; opencode with --format json)
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.stderr — reviewer stderr
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.status — helper heartbeat/status log
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.runner.out — helper-managed stdout
  • /tmp/do-task-<kind>-review-<REVIEW_ID>.sh — reviewer command script

Status log lines use this format:

ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"

in-progress is the liveness heartbeat emitted roughly once per minute with note="In progress N". stall-warning is a non-terminal status-log state only. It does not mean the caller should stop waiting if in-progress heartbeats continue.

Persistent Artifact

The one file kept across runs is ai_plan/<slug>/task-plan.md. Its Status enum drives resume decisions:

Status Meaning
draft Newly created; plan review not yet started
plan-approved Plan review loop returned APPROVED
implementation-in-progress Phase 6 executing
implementation-approved Phase 8 review loop returned APPROVED; awaiting commit
pushed Committed + pushed to remote
local-only Committed locally; user declined push
aborted-plan-review MAX_ROUNDS reached in Phase 5; user aborted
aborted-impl-review MAX_ROUNDS reached in Phase 8; user aborted
aborted-verification Phase 7 retries exhausted; user aborted
failed Hard tooling failure

Failure Handling

  • completed-empty-output — the reviewer exited without producing review text; surface .stderr and .status, then retry only after diagnosing the cause.
  • needs-operator-decision — the helper reached hard-timeout escalation; surface .status and decide whether to extend the timeout, abort, or retry with different parameters.
  • Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds retain .stderr, .status, and .runner.out until diagnosed.
  • Verification gate (Phase 7) retries up to 3 times. On exhaustion, Status becomes aborted-verification and the user is asked whether to retry, override, or abort.
  • As long as fresh in-progress heartbeats continue to arrive roughly once per minute, the caller keeps waiting.

Secret Scan (subroutine step 1a; per-payload; no caching)

Every outbound reviewer payload is scanned before being sent to the reviewer CLI. This scan runs on every round of both loops. No results are cached, because the Phase 8 payload includes newly-introduced diff content that earlier rounds never saw.

Canonical anchored regex list (10 patterns):

AWS access key:     AKIA[0-9A-Z]{16}
GCP service-acct:   "type"\s*:\s*"service_account"
GitHub tokens:      (ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
Slack tokens:       xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
                    xox[abpsr]-[A-Za-z0-9]{10,48}
OpenAI API keys:    sk-(proj-)?[A-Za-z0-9_-]{20,}
Anthropic API keys: sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
PEM private keys:   -----BEGIN [A-Z ]+ PRIVATE KEY-----
.env-style:         (TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
JWT:                eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+

If a match is found, the skill redacts the matched text before showing it to the user using the fixed token [REDACTED:<pattern-label>:<match-length>-chars] (pattern labels: aws-access-key, gcp-service-account, github-token, slack-token, openai-key, anthropic-key, pem-private-key, dotenv-style, jwt). File paths and line numbers are kept. Raw match text is never echoed to terminal, chat log, or any persistent file.

The user answers yes / no / redact:

  • yes — proceed; Runtime State records last_scan_outcome_<kind>=user-approved-with-matches.
  • redact — the user supplies redactions, the skill applies them, and re-scans before sending. Runtime State records last_scan_outcome_<kind>=redacted-and-approved.
  • no — stop the loop, set Status: failed, send Telegram summary.

Supported Reviewer CLIs

CLI Round-1 command Round-N resume Output capture
codex codex exec -m <model> -s read-only -o <out.md> "<prompt>" codex exec resume <session-id> -o <out.md> "<prompt>" <out.md> directly (helper --success-file)
claude claude -p "<prompt>" --model <model> --strict-mcp-config --setting-sources user Fresh call with prior-round context summary cp <runner.out> <out.md>
cursor cursor-agent -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json> cursor-agent --resume <id> -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json> jq -r '.result' <out.json> > <out.md>
opencode opencode run -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json> Fresh call (default) OR opencode run -s <id> -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json> (opt-in) jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' <out.json> > <out.md>
pi See PI-COMMON-REVIEWER.md Fresh call Markdown stdout copied to <out.md>

For all supported reviewer CLIs, the preferred execution path is:

  1. Write the reviewer command to a bash script.
  2. Run that script through reviewer-runtime/run-review.sh.
  3. Fall back to direct synchronous execution only if the helper is missing or not executable.

Pi Reviewer Support

All workflow variants can use Pi itself as a reviewer CLI. Use pi/<pi-model-name> shorthand, for example pi/claude-opus-4-7; this means REVIEWER_CLI=pi and REVIEWER_MODEL=claude-opus-4-7. Provider-qualified or multi-slash Pi model IDs are preserved after the first pi/ prefix, for example pi/anthropic/claude-opus-4-7.

The canonical isolated read-only Pi reviewer flag contract lives in PI-COMMON-REVIEWER.md. This workflow passes the plan and implementation review payload at /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and expects the standard ## Summary, ## Findings, and ## Verdict response. Pi reviewer output is captured as markdown stdout, not JSON.

If the Pi reviewer model or provider is unavailable, surface the helper stderr/status and use pi --list-models [search] to inspect configured models.

Notifications

  • Telegram is the only supported notification path.
  • Shared setup: TELEGRAM-NOTIFICATIONS.md
  • Notification failures are non-blocking, but they must be surfaced to the user.
  • Before stopping for any user interaction, approval, or manual decision, the skill sends a Telegram summary first if configured.
  • Terminal outcomes that trigger Telegram: pushed, local-only, aborted-plan-review, aborted-impl-review, aborted-verification, failed.

The reviewer-runtime helper also supports manual override flags for diagnostics:

run-review.sh \
  --command-file <path> \
  --stdout-file <path> \
  --stderr-file <path> \
  --status-file <path> \
  --poll-seconds 10 \
  --soft-timeout-seconds 600 \
  --stall-warning-seconds 300 \
  --hard-timeout-seconds 1800

Template Guardrails

All four templates/task-plan.md files share identical core sections (14 ##-level headings) and identical Status enum (10 values). Variant-specific guardrail language is permitted in the leading blockquote and in the Runtime field of the Metadata table.

Core sections (appear in every variant, same order):

  1. Metadata
  2. Prompt
  3. Interpretation
  4. Assumptions
  5. Files
  6. Approach
  7. TDD Approach
  8. Acceptance Criteria
  9. Verification
  10. Rollback
  11. Runtime State
  12. Review History
  13. Final Status
  14. Guardrails (do NOT remove)

Runtime State keys (same across all variants): plan_review_round, implementation_review_round, CODEX_PLAN_SESSION_ID, CODEX_IMPL_SESSION_ID, CURSOR_PLAN_SESSION_ID, CURSOR_IMPL_SESSION_ID, OPENCODE_PLAN_SESSION_ID, OPENCODE_IMPL_SESSION_ID, last_phase_entered, last_round_ts, last_scan_outcome_plan, last_scan_outcome_impl, verification_attempts, tests_added_count, tdd_used.

Variant Hardening Notes

Claude Code Hardening

  • Must invoke explicit required sub-skills via the Skill tool:
    • superpowers:brainstorming
    • superpowers:test-driven-development
    • superpowers:verification-before-completion
    • superpowers:finishing-a-development-branch
    • superpowers:using-git-worktrees (conditional)
  • Must enforce plan-mode file-write guard in Phase 4:
    • If currently in plan mode, instruct user to exit plan mode before writing task-plan.md.

Codex Hardening

  • Must use native skill discovery from ~/.agents/skills/ (no CLI wrappers).
  • Must verify Superpowers skills symlink: ~/.agents/skills/superpowers -> ~/.codex/superpowers/skills
  • Must invoke required sub-skills with explicit announcements before any action.
  • Must track checklist-driven sub-skills with update_plan todos (Codex equivalent of TodoWrite).
  • Task subagents are unavailable — do the work directly and state the limitation.
  • Deprecated CLI commands (superpowers-codex bootstrap, use-skill) must NOT be used.
  • Helper paths: ~/.codex/skills/reviewer-runtime/....
  • No plan-mode guard (Codex has no plan-mode concept).

OpenCode Hardening

  • Must use OpenCode's native skill tool (not Claude's Skill tool syntax). OpenCode may load shared skill files from ~/.agents/skills/, but invocation is still OpenCode-native.
  • Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms the required superpowers/<skill> set is discoverable before any other phase runs.
  • Must verify Superpowers skill discovery under ~/.agents/skills/superpowers or ~/.config/opencode/skills/superpowers.
  • Helper paths: ~/.config/opencode/skills/reviewer-runtime/....
  • Opencode reviewer calls MUST use --agent plan (the built-in plan primary agent) for read-only posture.
  • No plan-mode guard (OpenCode has no plan-mode concept).

Cursor Hardening

  • Must use Cursor-native discovery from .cursor/skills/, ~/.cursor/skills/, or installed Cursor plugin cache entries.
  • Must announce skill usage explicitly before invocation.
  • jq is a hard prerequisite.
  • Helper paths: .cursor/skills/reviewer-runtime/... preferred, ~/.cursor/skills/reviewer-runtime/... fallback.
  • Reviewer invocations MUST use --mode=ask --trust --output-format json. Never --mode=agent, never --force, never write-capable modes for reviewer calls.
  • No plan-mode guard (Cursor has no plan-mode concept).

Execution Workflow Rules

  • The skill works from ai_plan/YYYY-MM-DD-<slug>/task-plan.md as its single persistent artifact.
  • Current branch is the default; worktree is opt-in only through explicit trigger phrases.
  • Plan review completes before any implementation starts.
  • Phase 7 verification gate must pass before the implementation review starts.
  • The task commit is a single commit created in Phase 9.
  • The .gitignore infra commit (Phase 1) is explicitly separate from the task commit and is allowed even when the final task ends up aborted or failed.
  • No push without explicit yes from the user.
  • Secret scan runs per-payload with no caching.
  • MAX_ROUNDS=10 is shared across both loops (single mental model).