--- name: web-automation description: Browse and scrape web pages using Playwright with Camoufox anti-detection browser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or scraping websites with bot protection. --- # Web Automation with Camoufox (Codex) Automated web browsing and scraping using Playwright with two execution paths under one skill: - one-shot extraction via `extract.js` - broader stateful automation via Camoufox and the existing `auth.ts`, `browse.ts`, `flow.ts`, and `scrape.ts` ## When To Use Which Command - Use `node scripts/extract.js ""` for one-shot extraction from a single URL when you need rendered content, bounded stealth behavior, and JSON output. - Use `npx tsx scrape.ts ...` when you need markdown output, Readability extraction, full-page cleanup, or selector-based scraping. - Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` when the task needs interactive navigation, persistent sessions, login handling, click/type actions, or multi-step workflows. ## Requirements - Node.js 20+ - pnpm - Network access to download browser binaries ## First-Time Setup ```bash cd ~/.openclaw/workspace/skills/web-automation/scripts pnpm install npx playwright install chromium npx camoufox-js fetch pnpm approve-builds pnpm rebuild better-sqlite3 esbuild ``` ## Prerequisite Check (MANDATORY) Before running any automation, verify Playwright + Camoufox dependencies are installed and scripts are configured to use Camoufox. ```bash cd ~/.openclaw/workspace/skills/web-automation/scripts node -e "require.resolve('playwright/package.json');require.resolve('playwright-core/package.json');require.resolve('camoufox-js/package.json');console.log('OK: playwright + playwright-core + camoufox-js installed')" node -e "const fs=require('fs');const t=fs.readFileSync('browse.ts','utf8');if(!/camoufox-js/.test(t)){throw new Error('browse.ts is not configured for Camoufox')}console.log('OK: Camoufox integration detected in browse.ts')" ``` If any check fails, stop and return: "Missing dependency/config: web-automation requires `playwright`, `playwright-core`, and `camoufox-js` with Camoufox-based scripts. Run setup in this skill, then retry." If runtime fails with missing native bindings for `better-sqlite3` or `esbuild`, run: ```bash cd ~/.openclaw/workspace/skills/web-automation/scripts pnpm approve-builds pnpm rebuild better-sqlite3 esbuild ``` ## Quick Reference - One-shot JSON extract: `node scripts/extract.js "https://example.com"` - Browse page: `npx tsx browse.ts --url "https://example.com"` - Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md` - Authenticate: `npx tsx auth.ts --url "https://example.com/login"` - Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'` ## One-shot extraction Use `extract.js` when you need a single page fetch with JavaScript rendering and lightweight anti-bot shaping, but not a full automation session. ```bash node scripts/extract.js "https://example.com" WAIT_TIME=5000 node scripts/extract.js "https://example.com" SCREENSHOT_PATH=/tmp/page.png SAVE_HTML=true node scripts/extract.js "https://example.com" ``` Output is JSON only and includes fields such as: - `requestedUrl` - `finalUrl` - `title` - `content` - `metaDescription` - `status` - `elapsedSeconds` - `challengeDetected` - optional `screenshot` - optional `htmlFile` ## General flow runner Use `flow.ts` for multi-step commands in plain language (go/click/type/press/wait/screenshot). Example: ```bash npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s' ``` ## Notes - Sessions persist in Camoufox profile storage. - Use `--wait` for dynamic pages. - Use `--mode selector --selector "..."` for targeted extraction. - `extract.js` keeps stealth and bounded anti-bot shaping while keeping the Chromium sandbox enabled.