# web-automation Automated web browsing and scraping using Playwright-compatible CloakBrowser, with one-shot extraction and broader persistent automation under a single skill. ## What this skill is for - One-shot extraction from one URL with JSON output - Automating web workflows - Authenticated session flows (logins/cookies) - Extracting page content to markdown - Working with bot-protected or dynamic pages ## Command selection - Use `node skills/web-automation/scripts/extract.js ""` for one-shot extraction from a single URL - Use `npx tsx scrape.ts ...` for markdown scraping modes - Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` for interactive or authenticated flows ## Requirements - Node.js 20+ - `pnpm` - Network access to download the CloakBrowser binary on first use or via preinstall ## First-time setup ```bash cd ~/.openclaw/workspace/skills/web-automation/scripts pnpm install npx cloakbrowser install pnpm approve-builds pnpm rebuild better-sqlite3 esbuild ``` ## Updating CloakBrowser ```bash cd ~/.openclaw/workspace/skills/web-automation/scripts pnpm up cloakbrowser playwright-core npx cloakbrowser install pnpm approve-builds pnpm rebuild better-sqlite3 esbuild ``` ## System libraries (for OpenClaw Docker builds) ```bash export OPENCLAW_DOCKER_APT_PACKAGES="ffmpeg jq curl libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2" ``` ## Native module note If `pnpm install` warns that build scripts were ignored for native modules such as `better-sqlite3` or `esbuild`, run: ```bash pnpm approve-builds pnpm rebuild better-sqlite3 esbuild ``` Without this, helper scripts may fail before launch because the native bindings are missing. ## Common commands ```bash # One-shot JSON extraction node skills/web-automation/scripts/extract.js "https://example.com" # Browse a page with persistent profile npx tsx browse.ts --url "https://example.com" # Scrape markdown npx tsx scrape.ts --url "https://example.com" --mode main --output page.md # Authenticate flow npx tsx auth.ts --url "https://example.com/login" # General natural-language browser flow npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s' ``` ## One-shot extraction (`extract.js`) Use `extract.js` when the task is just: open one URL, render it, and return structured content. ### Features - JavaScript rendering - lightweight stealth and bounded anti-bot shaping - JSON-only output - optional screenshot and saved HTML - browser sandbox left enabled ### Options ```bash WAIT_TIME=5000 node skills/web-automation/scripts/extract.js "https://example.com" SCREENSHOT_PATH=/tmp/page.png node skills/web-automation/scripts/extract.js "https://example.com" SAVE_HTML=true node skills/web-automation/scripts/extract.js "https://example.com" HEADLESS=false node skills/web-automation/scripts/extract.js "https://example.com" USER_AGENT="Mozilla/5.0 ..." node skills/web-automation/scripts/extract.js "https://example.com" ``` ### Output fields - `requestedUrl` - `finalUrl` - `title` - `content` - `metaDescription` - `status` - `elapsedSeconds` - `challengeDetected` - optional `screenshot` - optional `htmlFile` ## Persistent browsing profile `browse.ts`, `auth.ts`, `flow.ts`, and `scrape.ts` use a persistent CloakBrowser profile so sessions survive across runs. Canonical env vars: - `CLOAKBROWSER_PROFILE_PATH` - `CLOAKBROWSER_HEADLESS` - `CLOAKBROWSER_USERNAME` - `CLOAKBROWSER_PASSWORD` Legacy aliases still supported for compatibility: - `CAMOUFOX_PROFILE_PATH` - `CAMOUFOX_HEADLESS` - `CAMOUFOX_USERNAME` - `CAMOUFOX_PASSWORD` ## Natural-language flow runner (`flow.ts`) Use `flow.ts` when you want a general command style like: - "go to this site" - "find this button and click it" - "type this and press enter" ### Example ```bash npx tsx flow.ts --instruction 'go to https://example.com then click on "Sign in" then type "stef@example.com" in #email then press enter' ``` You can also use JSON steps for deterministic runs: ```bash npx tsx flow.ts --steps '[{"action":"goto","url":"https://example.com"},{"action":"click","text":"Sign in"}]' ```