3.5 KiB
3.5 KiB
name, description
| name | description |
|---|---|
| web-automation | Browse and scrape web pages using Playwright-compatible CloakBrowser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or running multi-step browser flows. |
Web Automation with CloakBrowser (Codex)
Automated web browsing and scraping using Playwright-compatible CloakBrowser with two execution paths:
- one-shot extraction via
extract.js - broader stateful automation via
auth.ts,browse.ts,flow.ts,scan-local-app.ts, andscrape.ts
Requirements
- Node.js 20+
- pnpm
- Network access to download the CloakBrowser binary on first use
First-Time Setup
cd ~/.codex/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
Updating CloakBrowser
cd ~/.codex/skills/web-automation/scripts
pnpm up cloakbrowser playwright-core
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
Prerequisite Check (MANDATORY)
Before running automation, verify CloakBrowser and Playwright Core are installed and wired correctly.
cd ~/.codex/skills/web-automation/scripts
node check-install.js
If the check fails, stop and return:
"Missing dependency/config: web-automation requires cloakbrowser and playwright-core with CloakBrowser-based scripts. Run setup in this skill, then retry."
If runtime fails with missing native bindings for better-sqlite3 or esbuild, run:
cd ~/.codex/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
When To Use Which Command
- Use
node extract.js "<URL>"for a one-shot rendered fetch with JSON output. - Use
npx tsx scrape.ts ...when you need markdown extraction, Readability cleanup, or selector-based scraping. - Use
npx tsx browse.ts ...,auth.ts, orflow.tswhen the task needs login handling, persistent sessions, clicks, typing, screenshots, or multi-step navigation. - Use
npx tsx scan-local-app.tswhen you need a configurable local-app smoke pass driven bySCAN_*andCLOAKBROWSER_*environment variables.
Quick Reference
- Install check:
node check-install.js - One-shot JSON extract:
node extract.js "https://example.com" - Browse page:
npx tsx browse.ts --url "https://example.com" - Scrape markdown:
npx tsx scrape.ts --url "https://example.com" --mode main --output page.md - Authenticate:
npx tsx auth.ts --url "https://example.com/login" - Natural-language flow:
npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter' - Local app smoke scan:
SCAN_BASE_URL=http://localhost:3000 SCAN_ROUTES=/,/dashboard npx tsx scan-local-app.ts
Local App Smoke Scan
scan-local-app.ts is intentionally generic. Configure it with environment variables instead of editing the file:
SCAN_BASE_URLSCAN_LOGIN_PATHSCAN_USERNAMESCAN_PASSWORDSCAN_USERNAME_SELECTORSCAN_PASSWORD_SELECTORSCAN_SUBMIT_SELECTORSCAN_ROUTESSCAN_REPORT_PATHSCAN_HEADLESS
If SCAN_USERNAME or SCAN_PASSWORD are omitted, the script falls back to CLOAKBROWSER_USERNAME and CLOAKBROWSER_PASSWORD.
Notes
- Sessions persist in CloakBrowser profile storage.
- Use
--waitfor dynamic pages. - Use
--mode selector --selector "..."for targeted extraction. extract.jskeeps a bounded stealth/rendered fetch path without needing a long-lived automation session.