251148c3ff
## Summary - add repository-wide quality tooling and verification scaffolding, including CI workflows, pnpm workspace setup, ESLint/Prettier/markdown checks, and generated-output verification helpers - reorganize skill sources and generation flow by introducing canonical `_source` variants, generator/manifests, reusable helper abstractions, and shared web-automation/browser utilities - clean up and expand documentation so the root README flows into docs and skill docs, with clearer development, reviewer, installer, and workflow guidance ## Notable changes - docs flow and consistency cleanup across `README.md`, `docs/README.md`, and related docs - new scripts for `check`, docs verification, generated-file verification, shell portability, and safe directory replacement - refactors in Atlassian and web-automation skill runtimes to reduce duplication and centralize reusable code - changelog, development documentation, and CI surface updates ## Test Plan - [ ] `pnpm run check` - [ ] review generated/manifests and skill sync outputs - [ ] smoke-check docs flow from `README.md` to `docs/README.md` to skill docs ## Notes - this branch currently includes tracked `skills/web-automation/shared/node_modules` content that should be reviewed carefully as potentially noisy/accidental committed artifacts Co-authored-by: Stefano Fiorini <stefano.fiorini@firsthorizon.com> Reviewed-on: #1
533 lines
29 KiB
Markdown
533 lines
29 KiB
Markdown
# DO-TASK
|
|
|
|
## Purpose
|
|
|
|
Execute a single user-supplied prompt end-to-end with **two reviewer loops** (plan review +
|
|
implementation review), with TDD-first execution, a pre-implementation verification gate, and a
|
|
single task commit — all in one run of the skill. `do-task` is scoped to small-to-medium ad-hoc
|
|
tasks; for multi-milestone work use `create-plan` + `implement-plan` instead.
|
|
|
|
`do-task` persists one plan artifact per run: `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`. The
|
|
folder is kept as a record after success (not deleted). Resume is supported via the `Status` enum
|
|
and Runtime State fields.
|
|
|
|
## Requirements
|
|
|
|
- Git repo with `/ai_plan/` entry in `.gitignore` (the skill adds the entry automatically if
|
|
missing and commits it as a separate infra commit).
|
|
- Superpowers skills installed from: https://github.com/obra/superpowers
|
|
- Required dependencies (vary by variant; see Install below):
|
|
- `superpowers:brainstorming` (or `superpowers/brainstorming` for OpenCode)
|
|
- `superpowers:test-driven-development`
|
|
- `superpowers:verification-before-completion`
|
|
- `superpowers:finishing-a-development-branch`
|
|
- `superpowers:using-git-worktrees` (only when the prompt opts in to a worktree)
|
|
- For Codex, native skill discovery must be configured:
|
|
- `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
|
|
- Cursor can use the Cursor Superpowers plugin cache or manual `.cursor/skills/superpowers/skills`
|
|
/ `~/.cursor/skills/superpowers/skills` installs, and `jq` is a hard prerequisite for the
|
|
Cursor variant.
|
|
- OpenCode can use `~/.agents/skills/superpowers` or `~/.config/opencode/skills/superpowers`.
|
|
- Shared reviewer runtime (`run-review.sh`) AND Telegram notifier helper (`notify-telegram.sh`)
|
|
must be installed beside agent skills. Both scripts ship under `skills/reviewer-runtime/` in this
|
|
repo and must be copied into the per-variant location:
|
|
- Codex: `~/.codex/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
|
|
- Claude Code: `~/.claude/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
|
|
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
|
|
- Cursor: `.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}` (repo-local,
|
|
preferred) or `~/.cursor/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}`
|
|
(global fallback)
|
|
- Pi: `.pi/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh}` (repo-local) or
|
|
`~/.pi/agent/skills/reviewer-runtime/pi/{run-review.sh,notify-telegram.sh}` (global)
|
|
- Variant-specific prerequisites:
|
|
- **Claude Code:** `claude --version`, explicit `Skill`-tool invocation of sub-skills.
|
|
- **Codex:** `codex --version`; `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills` symlink present.
|
|
- **Cursor:** `cursor-agent --version`, `jq --version` (hard prereq), Superpowers available
|
|
from the Cursor plugin cache or manual Cursor skill roots.
|
|
- **OpenCode:** `opencode --version`; Superpowers available from `~/.agents/skills/superpowers`
|
|
or `~/.config/opencode/skills/superpowers`; Phase 1 runs Bootstrap Superpowers Context.
|
|
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
|
|
|
|
Dependency-missing messages are variant-specific:
|
|
|
|
- **Claude Code:** `Missing dependency: [specific missing item]. Install required Superpowers
|
|
skills (https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
|
|
- **Codex:** `Missing dependency: [specific missing item]. Install required Superpowers skills
|
|
(https://github.com/obra/superpowers) and the reviewer-runtime helper, then retry.`
|
|
- **Cursor:** `Missing dependency: [specific missing item]. Install Cursor Agent CLI, jq, and the
|
|
Cursor Superpowers plugin or Superpowers skills under .cursor/skills/ or ~/.cursor/skills/,
|
|
then retry.`
|
|
- **OpenCode:** `Missing dependency: [specific missing item]. Install required OpenCode
|
|
Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the
|
|
reviewer-runtime helper, then retry.`
|
|
- **Pi:** `Missing dependency: [specific missing item]. Install Pi, required Superpowers skills,
|
|
and the Pi reviewer-runtime helper, then retry.`
|
|
|
|
### Reviewer CLI Requirements
|
|
|
|
The canonical reviewer CLI support matrix is documented in
|
|
[REVIEWERS.md](./REVIEWERS.md). One of these CLIs must be installed to drive either of the two
|
|
review loops:
|
|
|
|
| Reviewer CLI | Install | Verify | Read-Only Mode | Session Resume |
|
|
|---|---|---|---|---|
|
|
| `codex` | `npm install -g @openai/codex` | `codex --version` | `-s read-only` | Yes (`codex exec resume <id>`) |
|
|
| `claude` | `npm install -g @anthropic-ai/claude-code` | `claude --version` | `--strict-mcp-config --setting-sources user` | No (fresh call each round) |
|
|
| `cursor` | `curl https://cursor.com/install -fsS \| bash` | `cursor-agent --version` (binary: `cursor-agent`; alias `cursor agent` also works) | `--mode=ask` | Yes (`--resume <id>`) |
|
|
| `opencode` | `brew install opencode` or your package manager | `opencode --version` | `--agent plan` | Opt-in (`-s <id>`; fresh call is the default) |
|
|
| `pi` | Install Pi coding agent | `pi --version`; list models with `pi --list-models [search]` | `--tools read,grep,find,ls` | No (fresh call each round) |
|
|
|
|
The reviewer CLI is independent of which agent is running the skill — e.g., Claude Code can send
|
|
both the plan and the implementation to Codex for review.
|
|
|
|
**Additional dependency for `cursor` reviewer:** `jq` is required to parse Cursor's JSON output.
|
|
Install via `brew install jq` (macOS) or your package manager. Verify: `jq --version`. The cursor
|
|
variant of `do-task` makes `jq` a hard prerequisite regardless of which reviewer CLI is selected.
|
|
|
|
## Install
|
|
|
|
### Codex
|
|
|
|
```bash
|
|
mkdir -p ~/.codex/skills/do-task
|
|
cp -R skills/do-task/codex/* ~/.codex/skills/do-task/
|
|
mkdir -p ~/.codex/skills/reviewer-runtime
|
|
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.codex/skills/reviewer-runtime/
|
|
chmod +x ~/.codex/skills/reviewer-runtime/*.sh
|
|
```
|
|
|
|
### Claude Code
|
|
|
|
```bash
|
|
mkdir -p ~/.claude/skills/do-task
|
|
cp -R skills/do-task/claude-code/* ~/.claude/skills/do-task/
|
|
mkdir -p ~/.claude/skills/reviewer-runtime
|
|
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.claude/skills/reviewer-runtime/
|
|
chmod +x ~/.claude/skills/reviewer-runtime/*.sh
|
|
```
|
|
|
|
### OpenCode
|
|
|
|
```bash
|
|
mkdir -p ~/.config/opencode/skills/do-task
|
|
cp -R skills/do-task/opencode/* ~/.config/opencode/skills/do-task/
|
|
mkdir -p ~/.config/opencode/skills/reviewer-runtime
|
|
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.config/opencode/skills/reviewer-runtime/
|
|
chmod +x ~/.config/opencode/skills/reviewer-runtime/*.sh
|
|
```
|
|
|
|
### Cursor
|
|
|
|
Copy into the repo-local `.cursor/skills/` directory (where the Cursor Agent CLI discovers skills):
|
|
|
|
```bash
|
|
mkdir -p .cursor/skills/do-task
|
|
cp -R skills/do-task/cursor/* .cursor/skills/do-task/
|
|
mkdir -p .cursor/skills/reviewer-runtime
|
|
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh .cursor/skills/reviewer-runtime/
|
|
chmod +x .cursor/skills/reviewer-runtime/*.sh
|
|
```
|
|
|
|
Or install globally (loaded via `~/.cursor/skills/`):
|
|
|
|
```bash
|
|
mkdir -p ~/.cursor/skills/do-task
|
|
cp -R skills/do-task/cursor/* ~/.cursor/skills/do-task/
|
|
mkdir -p ~/.cursor/skills/reviewer-runtime
|
|
cp skills/reviewer-runtime/run-review.sh skills/reviewer-runtime/notify-telegram.sh ~/.cursor/skills/reviewer-runtime/
|
|
chmod +x ~/.cursor/skills/reviewer-runtime/*.sh
|
|
```
|
|
|
|
### Pi
|
|
|
|
Recommended full Pi package install:
|
|
|
|
```bash
|
|
./scripts/install-pi-package.sh --global
|
|
# or, for project-local Pi package install
|
|
./scripts/install-pi-package.sh --local
|
|
```
|
|
|
|
Manual single-skill Pi install from the package mirror:
|
|
|
|
```bash
|
|
pnpm run sync:pi
|
|
mkdir -p .pi/skills/do-task
|
|
cp -R pi-package/skills/do-task/* .pi/skills/do-task/
|
|
mkdir -p .pi/skills/reviewer-runtime/pi
|
|
cp -R skills/reviewer-runtime/pi/* .pi/skills/reviewer-runtime/pi/
|
|
chmod +x .pi/skills/reviewer-runtime/pi/*.sh
|
|
```
|
|
|
|
Global manual installs use `~/.pi/agent/skills/do-task/` and `~/.pi/agent/skills/reviewer-runtime/pi/` instead of `.pi/skills/...`.
|
|
|
|
Pi workflow skills also require Superpowers. See [PI-SUPERPOWERS.md](./PI-SUPERPOWERS.md) and [PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md).
|
|
|
|
## Verify Installation
|
|
|
|
Run the per-variant checks for everything the corresponding `SKILL.md` enforces. Each check is
|
|
structured: (1) CLI binary version, (2) skill file presence, (3) reviewer-runtime + notifier
|
|
helper presence, (4) Superpowers sub-skill discovery, (5) variant-specific extras.
|
|
|
|
### Codex Verify
|
|
|
|
```bash
|
|
codex --version
|
|
test -f ~/.codex/skills/do-task/SKILL.md
|
|
test -x ~/.codex/skills/reviewer-runtime/run-review.sh
|
|
test -x ~/.codex/skills/reviewer-runtime/notify-telegram.sh
|
|
test -L ~/.agents/skills/superpowers
|
|
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
|
|
```
|
|
|
|
### Claude Code Verify
|
|
|
|
```bash
|
|
claude --version
|
|
test -f ~/.claude/skills/do-task/SKILL.md
|
|
test -x ~/.claude/skills/reviewer-runtime/run-review.sh
|
|
test -x ~/.claude/skills/reviewer-runtime/notify-telegram.sh
|
|
test -f ~/.claude/skills/superpowers/brainstorming/SKILL.md
|
|
test -f ~/.claude/skills/superpowers/test-driven-development/SKILL.md
|
|
test -f ~/.claude/skills/superpowers/verification-before-completion/SKILL.md
|
|
test -f ~/.claude/skills/superpowers/finishing-a-development-branch/SKILL.md
|
|
```
|
|
|
|
### OpenCode Verify
|
|
|
|
```bash
|
|
opencode --version
|
|
test -f ~/.config/opencode/skills/do-task/SKILL.md
|
|
test -x ~/.config/opencode/skills/reviewer-runtime/run-review.sh
|
|
test -x ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
|
|
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md || test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
|
|
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md
|
|
```
|
|
|
|
### Cursor Verify
|
|
|
|
```bash
|
|
cursor-agent --version
|
|
jq --version
|
|
test -f .cursor/skills/do-task/SKILL.md || test -f ~/.cursor/skills/do-task/SKILL.md
|
|
test -x .cursor/skills/reviewer-runtime/run-review.sh || test -x ~/.cursor/skills/reviewer-runtime/run-review.sh
|
|
test -x .cursor/skills/reviewer-runtime/notify-telegram.sh || test -x ~/.cursor/skills/reviewer-runtime/notify-telegram.sh
|
|
test -f .cursor/skills/superpowers/skills/brainstorming/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/brainstorming/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/brainstorming/SKILL.md' -print -quit 2>/dev/null | grep -q .
|
|
test -f .cursor/skills/superpowers/skills/test-driven-development/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/test-driven-development/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/test-driven-development/SKILL.md' -print -quit 2>/dev/null | grep -q .
|
|
test -f .cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/verification-before-completion/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/verification-before-completion/SKILL.md' -print -quit 2>/dev/null | grep -q .
|
|
test -f .cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || test -f ~/.cursor/skills/superpowers/skills/finishing-a-development-branch/SKILL.md || find ~/.cursor/plugins/cache/cursor-public/superpowers -path '*/skills/finishing-a-development-branch/SKILL.md' -print -quit 2>/dev/null | grep -q .
|
|
```
|
|
|
|
### Pi Verify
|
|
|
|
```bash
|
|
pi --version
|
|
test -f .pi/skills/do-task/SKILL.md || test -f ~/.pi/agent/skills/do-task/SKILL.md
|
|
test -x .pi/skills/reviewer-runtime/pi/run-review.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/run-review.sh
|
|
test -x .pi/skills/reviewer-runtime/pi/notify-telegram.sh || test -x ~/.pi/agent/skills/reviewer-runtime/pi/notify-telegram.sh
|
|
test -f .pi/skills/superpowers/brainstorming/SKILL.md || test -f ~/.pi/agent/skills/superpowers/brainstorming/SKILL.md || test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md
|
|
test -f .pi/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.pi/agent/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md
|
|
test -f .pi/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.pi/agent/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
|
|
test -f .pi/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.pi/agent/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
|
|
```
|
|
|
|
## Key Behavior
|
|
|
|
- Creates one persistent plan artifact at `ai_plan/YYYY-MM-DD-<slug>/task-plan.md`.
|
|
- Ensures `/ai_plan/` is in `.gitignore`. If missing, adds it and creates a separate
|
|
`chore(gitignore): ignore ai_plan local planning artifacts` commit.
|
|
- Parses the user prompt, detects the trigger phrase, and asks 1-3 clarifying questions unless
|
|
the prompt already has a concrete target + outcome + unambiguous scope + resolvable identifiers.
|
|
- Invokes `superpowers:brainstorming` for any behavior-changing task (feature creation,
|
|
non-trivial bug fix, refactor, design decision). The only skip conditions are
|
|
`pure-documentation` and `pure-comment-whitespace-rename`.
|
|
- Asks which reviewer CLI, model, and max rounds to use (or accepts `skip` for no review).
|
|
"Use defaults" maps to `codex / gpt-5.4 / MAX_ROUNDS=10`.
|
|
- Runs the plan review loop (Phase 5) before implementation, iterating up to `MAX_ROUNDS`
|
|
(default 10) or until the reviewer returns `VERDICT: APPROVED`.
|
|
- Executes with TDD-first (Phase 6) via `superpowers:test-driven-development`. Auto-skip
|
|
permitted only for `pure-documentation` and `pure-comment-whitespace-rename`; all other skips
|
|
(including config-file additions) require explicit user approval, recorded in the TDD Approach
|
|
section with an ISO-8601 timestamp.
|
|
- Runs lint/typecheck/tests as a **verification gate** (Phase 7) before the implementation review loop.
|
|
- Runs the implementation review loop (Phase 8) against the diff + verification output,
|
|
iterating up to `MAX_ROUNDS` or until `APPROVED`.
|
|
- Scans every outbound reviewer payload for secrets (subroutine step 1a). Per-payload, no caching.
|
|
- Creates a **single commit** after the implementation review approves. Does NOT push. Asks the
|
|
user for explicit `yes` before any push.
|
|
- Defaults to the **current branch**. Worktree only on explicit opt-in (`"in a worktree"`,
|
|
`"use a worktree"`, `"on an isolated branch"`, `"on a new branch called X"`).
|
|
- Supports resume: detects existing folder by slug and uses `Status` + Runtime State to decide how to re-enter.
|
|
- Sends completion notifications through Telegram only when the shared setup in
|
|
[TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
|
|
|
|
## Dual Review Loops
|
|
|
|
`do-task` runs the reviewer twice per successful run, with separate session IDs so reviewer context never leaks across loops.
|
|
|
|
1. **Plan review loop (Phase 5)** — payload is the current `task-plan.md` with `Runtime State`
|
|
and `Review History` stripped. The reviewer evaluates whether the plan matches the prompt,
|
|
whether assumptions are surfaced, whether acceptance criteria are testable, whether the TDD
|
|
approach is appropriate, and whether there are missing files/risks/security concerns.
|
|
2. **Implementation review loop (Phase 8)** — payload is the approved task plan (without Runtime
|
|
State) + `git diff` (unstaged + staged) + verification output (lint, typecheck, tests). The
|
|
reviewer evaluates correctness, code quality, test coverage, security, and regression risk.
|
|
|
|
Both loops share the same 9-step subroutine and the same `MAX_ROUNDS` counter (default 10).
|
|
|
|
### Subroutine Steps (inside each review loop)
|
|
|
|
1. Write payload to `/tmp/do-task-<kind>-<REVIEW_ID>.md`.
|
|
2. **Secret scan (step 1a)** — per-payload, no caching. See Secret Scan section below.
|
|
3. Generate reviewer command script at `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh`.
|
|
4. Run via `reviewer-runtime/run-review.sh`.
|
|
5. Promote reviewer output and capture the session ID on Round 1; persist it to `task-plan.md`
|
|
Runtime State under the loop-specific variable (`CODEX_PLAN_SESSION_ID`,
|
|
`CODEX_IMPL_SESSION_ID`, `CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`,
|
|
`OPENCODE_PLAN_SESSION_ID`, or `OPENCODE_IMPL_SESSION_ID`).
|
|
6. Parse verdict; append an entry to Review History; bump the round counter.
|
|
7. Branch: `APPROVED` → exit, `REVISE` → caller revises and re-enters, `MAX_ROUNDS` → caller decides.
|
|
8. Liveness contract: wait while `In progress N` heartbeats arrive from the runner.
|
|
9. Cleanup temp artifacts on success.
|
|
|
|
### Reviewer Output Contract
|
|
|
|
- `P0` = total blocker
|
|
- `P1` = major risk
|
|
- `P2` = must-fix before approval
|
|
- `P3` = cosmetic / nice to have
|
|
- Each severity section uses `- None.` when empty.
|
|
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain.
|
|
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe.
|
|
|
|
## Runtime Artifacts
|
|
|
|
Per review loop (`<kind>` = `plan` or `implementation`):
|
|
|
|
- `/tmp/do-task-<kind>-<REVIEW_ID>.md` — payload
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.md` — normalized review text
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.json` — raw JSON (cursor always; opencode with `--format json`)
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.stderr` — reviewer stderr
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.status` — helper heartbeat/status log
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.runner.out` — helper-managed stdout
|
|
- `/tmp/do-task-<kind>-review-<REVIEW_ID>.sh` — reviewer command script
|
|
|
|
Status log lines use this format:
|
|
|
|
```text
|
|
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
|
|
```
|
|
|
|
`in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
|
|
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should
|
|
stop waiting if `in-progress` heartbeats continue.
|
|
|
|
### Persistent Artifact
|
|
|
|
The one file kept across runs is `ai_plan/<slug>/task-plan.md`. Its `Status` enum drives resume decisions:
|
|
|
|
| Status | Meaning |
|
|
|---|---|
|
|
| `draft` | Newly created; plan review not yet started |
|
|
| `plan-approved` | Plan review loop returned APPROVED |
|
|
| `implementation-in-progress` | Phase 6 executing |
|
|
| `implementation-approved` | Phase 8 review loop returned APPROVED; awaiting commit |
|
|
| `pushed` | Committed + pushed to remote |
|
|
| `local-only` | Committed locally; user declined push |
|
|
| `aborted-plan-review` | MAX_ROUNDS reached in Phase 5; user aborted |
|
|
| `aborted-impl-review` | MAX_ROUNDS reached in Phase 8; user aborted |
|
|
| `aborted-verification` | Phase 7 retries exhausted; user aborted |
|
|
| `failed` | Hard tooling failure |
|
|
|
|
## Failure Handling
|
|
|
|
- `completed-empty-output` — the reviewer exited without producing review text; surface
|
|
`.stderr` and `.status`, then retry only after diagnosing the cause.
|
|
- `needs-operator-decision` — the helper reached hard-timeout escalation; surface `.status`
|
|
and decide whether to extend the timeout, abort, or retry with different parameters.
|
|
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds
|
|
retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
|
|
- Verification gate (Phase 7) retries up to 3 times. On exhaustion, `Status` becomes
|
|
`aborted-verification` and the user is asked whether to retry, override, or abort.
|
|
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller keeps waiting.
|
|
|
|
## Secret Scan (subroutine step 1a; per-payload; no caching)
|
|
|
|
Every outbound reviewer payload is scanned **before** being sent to the reviewer CLI. This scan
|
|
runs on every round of both loops. No results are cached, because the Phase 8 payload includes
|
|
newly-introduced diff content that earlier rounds never saw.
|
|
|
|
Canonical anchored regex list (10 patterns):
|
|
|
|
```text
|
|
AWS access key: AKIA[0-9A-Z]{16}
|
|
GCP service-acct: "type"\s*:\s*"service_account"
|
|
GitHub tokens: (ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
|
|
Slack tokens: xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
|
|
xox[abpsr]-[A-Za-z0-9]{10,48}
|
|
OpenAI API keys: sk-(proj-)?[A-Za-z0-9_-]{20,}
|
|
Anthropic API keys: sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
|
|
PEM private keys: -----BEGIN [A-Z ]+ PRIVATE KEY-----
|
|
.env-style: (TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
|
|
JWT: eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
|
|
```
|
|
|
|
If a match is found, the skill **redacts the matched text before showing it to the user** using
|
|
the fixed token `[REDACTED:<pattern-label>:<match-length>-chars]` (pattern labels:
|
|
`aws-access-key`, `gcp-service-account`, `github-token`, `slack-token`, `openai-key`,
|
|
`anthropic-key`, `pem-private-key`, `dotenv-style`, `jwt`). File paths and line numbers are kept.
|
|
Raw match text is never echoed to terminal, chat log, or any persistent file.
|
|
|
|
The user answers `yes` / `no` / `redact`:
|
|
|
|
- `yes` — proceed; Runtime State records `last_scan_outcome_<kind>=user-approved-with-matches`.
|
|
- `redact` — the user supplies redactions, the skill applies them, and re-scans before sending. Runtime State records `last_scan_outcome_<kind>=redacted-and-approved`.
|
|
- `no` — stop the loop, set `Status: failed`, send Telegram summary.
|
|
|
|
## Supported Reviewer CLIs
|
|
|
|
| CLI | Round-1 command | Round-N resume | Output capture |
|
|
|---|---|---|---|
|
|
| `codex` | `codex exec -m <model> -s read-only -o <out.md> "<prompt>"` | `codex exec resume <session-id> -o <out.md> "<prompt>"` | `<out.md>` directly (helper `--success-file`) |
|
|
| `claude` | `claude -p "<prompt>" --model <model> --strict-mcp-config --setting-sources user` | Fresh call with prior-round context summary | `cp <runner.out> <out.md>` |
|
|
| `cursor` | `cursor-agent -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `cursor-agent --resume <id> -p --mode=ask --model <model> --trust --output-format json "<prompt>" > <out.json>` | `jq -r '.result' <out.json> > <out.md>` |
|
|
| `opencode` | `opencode run -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` | Fresh call (default) OR `opencode run -s <id> -m <provider>/<model> --agent plan --format json "<prompt>" > <out.json>` (opt-in) | `jq -r '.[] \| select(.type == "message" and .role == "assistant") \| .content' <out.json> > <out.md>` |
|
|
| `pi` | See [PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md) | Fresh call | Markdown stdout copied to `<out.md>` |
|
|
|
|
For all supported reviewer CLIs, the preferred execution path is:
|
|
|
|
1. Write the reviewer command to a bash script.
|
|
2. Run that script through `reviewer-runtime/run-review.sh`.
|
|
3. Fall back to direct synchronous execution only if the helper is missing or not executable.
|
|
|
|
## Pi Reviewer Support
|
|
|
|
All workflow variants can use Pi itself as a reviewer CLI. Use `pi/<pi-model-name>` shorthand,
|
|
for example `pi/claude-opus-4-7`; this means `REVIEWER_CLI=pi` and
|
|
`REVIEWER_MODEL=claude-opus-4-7`. Provider-qualified or multi-slash Pi model IDs are preserved
|
|
after the first `pi/` prefix, for example `pi/anthropic/claude-opus-4-7`.
|
|
|
|
The canonical isolated read-only Pi reviewer flag contract lives in
|
|
[PI-COMMON-REVIEWER.md](./PI-COMMON-REVIEWER.md). This workflow passes the plan and
|
|
implementation review payload at `/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md` and expects the
|
|
standard `## Summary`, `## Findings`, and `## Verdict` response. Pi reviewer output is captured
|
|
as markdown stdout, not JSON.
|
|
|
|
If the Pi reviewer model or provider is unavailable, surface the helper stderr/status and use
|
|
`pi --list-models [search]` to inspect configured models.
|
|
|
|
## Notifications
|
|
|
|
- Telegram is the only supported notification path.
|
|
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
|
|
- Notification failures are non-blocking, but they must be surfaced to the user.
|
|
- Before stopping for any user interaction, approval, or manual decision, the skill sends a Telegram summary first if configured.
|
|
- Terminal outcomes that trigger Telegram: `pushed`, `local-only`, `aborted-plan-review`,
|
|
`aborted-impl-review`, `aborted-verification`, `failed`.
|
|
|
|
The reviewer-runtime helper also supports manual override flags for diagnostics:
|
|
|
|
```bash
|
|
run-review.sh \
|
|
--command-file <path> \
|
|
--stdout-file <path> \
|
|
--stderr-file <path> \
|
|
--status-file <path> \
|
|
--poll-seconds 10 \
|
|
--soft-timeout-seconds 600 \
|
|
--stall-warning-seconds 300 \
|
|
--hard-timeout-seconds 1800
|
|
```
|
|
|
|
## Template Guardrails
|
|
|
|
All four `templates/task-plan.md` files share identical core sections (14 `##`-level headings)
|
|
and identical Status enum (10 values). Variant-specific guardrail language is permitted in the
|
|
leading blockquote and in the `Runtime` field of the Metadata table.
|
|
|
|
**Core sections** (appear in every variant, same order):
|
|
|
|
1. Metadata
|
|
2. Prompt
|
|
3. Interpretation
|
|
4. Assumptions
|
|
5. Files
|
|
6. Approach
|
|
7. TDD Approach
|
|
8. Acceptance Criteria
|
|
9. Verification
|
|
10. Rollback
|
|
11. Runtime State
|
|
12. Review History
|
|
13. Final Status
|
|
14. Guardrails (do NOT remove)
|
|
|
|
**Runtime State keys** (same across all variants): `plan_review_round`,
|
|
`implementation_review_round`, `CODEX_PLAN_SESSION_ID`, `CODEX_IMPL_SESSION_ID`,
|
|
`CURSOR_PLAN_SESSION_ID`, `CURSOR_IMPL_SESSION_ID`, `OPENCODE_PLAN_SESSION_ID`,
|
|
`OPENCODE_IMPL_SESSION_ID`, `last_phase_entered`, `last_round_ts`, `last_scan_outcome_plan`,
|
|
`last_scan_outcome_impl`, `verification_attempts`, `tests_added_count`, `tdd_used`.
|
|
|
|
## Variant Hardening Notes
|
|
|
|
### Claude Code Hardening
|
|
|
|
- Must invoke explicit required sub-skills via the `Skill` tool:
|
|
- `superpowers:brainstorming`
|
|
- `superpowers:test-driven-development`
|
|
- `superpowers:verification-before-completion`
|
|
- `superpowers:finishing-a-development-branch`
|
|
- `superpowers:using-git-worktrees` (conditional)
|
|
- Must enforce plan-mode file-write guard in Phase 4:
|
|
- If currently in plan mode, instruct user to exit plan mode before writing `task-plan.md`.
|
|
|
|
### Codex Hardening
|
|
|
|
- Must use native skill discovery from `~/.agents/skills/` (no CLI wrappers).
|
|
- Must verify Superpowers skills symlink: `~/.agents/skills/superpowers -> ~/.codex/superpowers/skills`
|
|
- Must invoke required sub-skills with explicit announcements before any action.
|
|
- Must track checklist-driven sub-skills with `update_plan` todos (Codex equivalent of `TodoWrite`).
|
|
- `Task` subagents are unavailable — do the work directly and state the limitation.
|
|
- Deprecated CLI commands (`superpowers-codex bootstrap`, `use-skill`) must NOT be used.
|
|
- Helper paths: `~/.codex/skills/reviewer-runtime/...`.
|
|
- No plan-mode guard (Codex has no plan-mode concept).
|
|
|
|
### OpenCode Hardening
|
|
|
|
- Must use OpenCode's native skill tool (not Claude's `Skill` tool syntax). OpenCode may load
|
|
shared skill files from `~/.agents/skills/`, but invocation is still OpenCode-native.
|
|
- Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms
|
|
the required `superpowers/<skill>` set is discoverable before any other phase runs.
|
|
- Must verify Superpowers skill discovery under `~/.agents/skills/superpowers` or `~/.config/opencode/skills/superpowers`.
|
|
- Helper paths: `~/.config/opencode/skills/reviewer-runtime/...`.
|
|
- Opencode reviewer calls MUST use `--agent plan` (the built-in plan primary agent) for read-only posture.
|
|
- No plan-mode guard (OpenCode has no plan-mode concept).
|
|
|
|
### Cursor Hardening
|
|
|
|
- Must use Cursor-native discovery from `.cursor/skills/`, `~/.cursor/skills/`, or installed Cursor plugin cache entries.
|
|
- Must announce skill usage explicitly before invocation.
|
|
- `jq` is a hard prerequisite.
|
|
- Helper paths: `.cursor/skills/reviewer-runtime/...` preferred, `~/.cursor/skills/reviewer-runtime/...` fallback.
|
|
- Reviewer invocations MUST use `--mode=ask --trust --output-format json`. Never `--mode=agent`,
|
|
never `--force`, never write-capable modes for reviewer calls.
|
|
- No plan-mode guard (Cursor has no plan-mode concept).
|
|
|
|
## Execution Workflow Rules
|
|
|
|
- The skill works from `ai_plan/YYYY-MM-DD-<slug>/task-plan.md` as its single persistent artifact.
|
|
- Current branch is the default; worktree is opt-in only through explicit trigger phrases.
|
|
- Plan review completes before any implementation starts.
|
|
- Phase 7 verification gate must pass before the implementation review starts.
|
|
- The task commit is a single commit created in Phase 9.
|
|
- The `.gitignore` infra commit (Phase 1) is explicitly separate from the task commit and is
|
|
allowed even when the final task ends up `aborted` or `failed`.
|
|
- No push without explicit `yes` from the user.
|
|
- Secret scan runs per-payload with no caching.
|
|
- `MAX_ROUNDS=10` is shared across both loops (single mental model).
|