Files
stefano 251148c3ff
check / check (ubuntu-latest) (push) Successful in 2m5s
check / check (macos-latest) (push) Has been cancelled
check-online / check-online (ubuntu-latest) (push) Successful in 1m53s
Perform code optimization and document cleanup (#1)
## Summary
- add repository-wide quality tooling and verification scaffolding, including CI workflows, pnpm workspace setup, ESLint/Prettier/markdown checks, and generated-output verification helpers
- reorganize skill sources and generation flow by introducing canonical `_source` variants, generator/manifests, reusable helper abstractions, and shared web-automation/browser utilities
- clean up and expand documentation so the root README flows into docs and skill docs, with clearer development, reviewer, installer, and workflow guidance

## Notable changes
- docs flow and consistency cleanup across `README.md`, `docs/README.md`, and related docs
- new scripts for `check`, docs verification, generated-file verification, shell portability, and safe directory replacement
- refactors in Atlassian and web-automation skill runtimes to reduce duplication and centralize reusable code
- changelog, development documentation, and CI surface updates

## Test Plan
- [ ] `pnpm run check`
- [ ] review generated/manifests and skill sync outputs
- [ ] smoke-check docs flow from `README.md` to `docs/README.md` to skill docs

## Notes
- this branch currently includes tracked `skills/web-automation/shared/node_modules` content that should be reviewed carefully as potentially noisy/accidental committed artifacts

Co-authored-by: Stefano Fiorini <stefano.fiorini@firsthorizon.com>
Reviewed-on: #1
2026-05-04 04:41:34 +00:00

115 lines
3.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: web-automation
description: Browse and scrape web pages using Playwright-compatible CloakBrowser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or running multi-step browser flows.
---
<!-- ⚠️ GENERATED FILE do not edit directly. Edit the canonical source in skills/web-automation/_source/cursor/SKILL.md and run `pnpm run sync:pi`. -->
# Web Automation with CloakBrowser (Cursor)
Automated web browsing and scraping using Playwright-compatible CloakBrowser with two execution paths:
- one-shot extraction via `extract.js`
- broader stateful automation via `auth.ts`, `browse.ts`, `flow.ts`, `scan-local-app.ts`, and `scrape.ts`
## Requirements
- Node.js 20+
- pnpm
- Network access to download the CloakBrowser binary on first use
## First-Time Setup
Repo-local install:
```bash
cd .cursor/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
Global install:
```bash
cd ~/.cursor/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## Updating CloakBrowser
Run from the installed `scripts/` directory:
```bash
pnpm up cloakbrowser playwright-core
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## Prerequisite Check (MANDATORY)
Before running automation, verify CloakBrowser and Playwright Core are installed and wired correctly.
```bash
cd .cursor/skills/web-automation/scripts || cd ~/.cursor/skills/web-automation/scripts
node check-install.js
```
If the check fails, stop and return:
"Missing dependency/config: web-automation requires `cloakbrowser` and `playwright-core` with CloakBrowser-based scripts. Run setup in this skill, then retry."
If runtime fails with missing native bindings for `better-sqlite3` or `esbuild`, run:
```bash
cd .cursor/skills/web-automation/scripts || cd ~/.cursor/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## When To Use Which Command
- Use `node extract.js "<URL>"` for a one-shot rendered fetch with JSON output.
- Use `npx tsx scrape.ts ...` when you need markdown extraction, Readability cleanup, or selector-based scraping.
- Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` when the task needs login handling, persistent sessions, clicks, typing, screenshots, or multi-step navigation.
- Use `npx tsx scan-local-app.ts` when you need a configurable local-app smoke pass driven by `SCAN_*` and `CLOAKBROWSER_*` environment variables.
## Quick Reference
- Install check: `node check-install.js`
- One-shot JSON extract: `node extract.js "https://example.com"`
- Browse page: `npx tsx browse.ts --url "https://example.com"`
- Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md`
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
- Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'`
- Local app smoke scan: `SCAN_BASE_URL=http://localhost:3000 SCAN_ROUTES=/,/dashboard npx tsx scan-local-app.ts`
## Local App Smoke Scan
`scan-local-app.ts` is intentionally generic. Configure it with environment variables instead of editing the file:
- `SCAN_BASE_URL`
- `SCAN_LOGIN_PATH`
- `SCAN_USERNAME`
- `SCAN_PASSWORD`
- `SCAN_USERNAME_SELECTOR`
- `SCAN_PASSWORD_SELECTOR`
- `SCAN_SUBMIT_SELECTOR`
- `SCAN_ROUTES`
- `SCAN_REPORT_PATH`
- `SCAN_HEADLESS`
If `SCAN_USERNAME` or `SCAN_PASSWORD` are omitted, the script falls back to `CLOAKBROWSER_USERNAME` and `CLOAKBROWSER_PASSWORD`.
## Notes
- Sessions persist in CloakBrowser profile storage.
- Use `--wait` for dynamic pages.
- Use `--mode selector --selector "..."` for targeted extraction.
- `extract.js` keeps a bounded stealth/rendered fetch path without needing a long-lived automation session.