Files
ai-coding-skills/pi-package/skills/web-automation/SKILL.md
T
stefano 251148c3ff
check / check (ubuntu-latest) (push) Successful in 2m5s
check / check (macos-latest) (push) Has been cancelled
check-online / check-online (ubuntu-latest) (push) Successful in 1m53s
Perform code optimization and document cleanup (#1)
## Summary
- add repository-wide quality tooling and verification scaffolding, including CI workflows, pnpm workspace setup, ESLint/Prettier/markdown checks, and generated-output verification helpers
- reorganize skill sources and generation flow by introducing canonical `_source` variants, generator/manifests, reusable helper abstractions, and shared web-automation/browser utilities
- clean up and expand documentation so the root README flows into docs and skill docs, with clearer development, reviewer, installer, and workflow guidance

## Notable changes
- docs flow and consistency cleanup across `README.md`, `docs/README.md`, and related docs
- new scripts for `check`, docs verification, generated-file verification, shell portability, and safe directory replacement
- refactors in Atlassian and web-automation skill runtimes to reduce duplication and centralize reusable code
- changelog, development documentation, and CI surface updates

## Test Plan
- [ ] `pnpm run check`
- [ ] review generated/manifests and skill sync outputs
- [ ] smoke-check docs flow from `README.md` to `docs/README.md` to skill docs

## Notes
- this branch currently includes tracked `skills/web-automation/shared/node_modules` content that should be reviewed carefully as potentially noisy/accidental committed artifacts

Co-authored-by: Stefano Fiorini <stefano.fiorini@firsthorizon.com>
Reviewed-on: #1
2026-05-04 04:41:34 +00:00

125 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: web-automation
description: Browse and scrape web pages using Playwright-compatible CloakBrowser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or running multi-step browser flows.
---
<!-- ⚠️ GENERATED FILE do not edit directly. Edit the canonical source in skills/web-automation/_source/pi/SKILL.md and run `pnpm run sync:pi`. -->
# Web Automation with CloakBrowser (Pi)
Automated web browsing and scraping for pi using the shared runtime bundle in `scripts/`.
## Requirements
- Node.js 20+
- `pnpm`
- Network access to download the CloakBrowser binary on first use
## First-Time Setup
Global install:
```bash
mkdir -p ~/.pi/agent/skills/web-automation
cp -R skills/web-automation/pi/* ~/.pi/agent/skills/web-automation/
cd ~/.pi/agent/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
Project-local install:
```bash
mkdir -p .pi/skills/web-automation
cp -R skills/web-automation/pi/* .pi/skills/web-automation/
cd .pi/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
Pi can also load this repo through settings or package installs as documented in [docs/PI.md](../../../docs/PI.md).
If you installed this repo from a local checkout with `./scripts/install-pi-package.sh`, the runtime stays in the checkout mirror at `pi-package/skills/web-automation/scripts`.
## Updating CloakBrowser
Run inside the installed `scripts/` directory for the pi skill. The commands below work for both global and project-local installs as long as you run them from the installed `scripts/` directory.
```bash
pnpm up cloakbrowser playwright-core
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## Prerequisite Check (MANDATORY)
Before running automation, verify the runtime from the location that matches your install style:
- local checkout package install: `pi-package/skills/web-automation/scripts`
- project-local copied install: `.pi/skills/web-automation/scripts`
- global copied install: `~/.pi/agent/skills/web-automation/scripts`
```bash
cd pi-package/skills/web-automation/scripts
node check-install.js
```
If the check fails, stop and return:
`Missing dependency/config: web-automation requires cloakbrowser and playwright-core with CloakBrowser-based scripts. Run setup in this skill, then retry.`
If runtime fails with missing native bindings for `better-sqlite3` or `esbuild`, run the same commands from your installed `scripts/` directory:
```bash
cd pi-package/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## When To Use Which Command
- Use `node extract.js "<URL>"` for a one-shot rendered fetch with JSON output.
- Use `npx tsx scrape.ts ...` when you need markdown extraction, Readability cleanup, or selector-based scraping.
- Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` when the task needs login handling, persistent sessions, clicks, typing, screenshots, or multi-step navigation.
- Use `npx tsx scan-local-app.ts` when you need a configurable local-app smoke pass driven by `SCAN_*` and `CLOAKBROWSER_*` environment variables.
## Quick Reference
- Install check: `node check-install.js`
- One-shot JSON extract: `node extract.js "https://example.com"`
- Browse page: `npx tsx browse.ts --url "https://example.com"`
- Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md`
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
- Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'`
- Local app smoke scan: `SCAN_BASE_URL=http://localhost:3000 SCAN_ROUTES=/,/dashboard npx tsx scan-local-app.ts`
## Local App Smoke Scan
`scan-local-app.ts` is intentionally generic. Configure it with environment variables instead of editing the file:
- `SCAN_BASE_URL`
- `SCAN_LOGIN_PATH`
- `SCAN_USERNAME`
- `SCAN_PASSWORD`
- `SCAN_USERNAME_SELECTOR`
- `SCAN_PASSWORD_SELECTOR`
- `SCAN_SUBMIT_SELECTOR`
- `SCAN_ROUTES`
- `SCAN_REPORT_PATH`
- `SCAN_HEADLESS`
If `SCAN_USERNAME` or `SCAN_PASSWORD` are omitted, the script falls back to `CLOAKBROWSER_USERNAME` and `CLOAKBROWSER_PASSWORD`.
## Notes
- Sessions persist in CloakBrowser profile storage.
- Use `--wait` for dynamic pages.
- Use `--mode selector --selector "..."` for targeted extraction.
- `extract.js` keeps a bounded stealth/rendered fetch path without needing a long-lived automation session.
- Package installs use the repo's `pi-package/skills/web-automation/` mirror so the installed skill directory name matches `web-automation`.