Files
stef-openclaw-skills/skills/web-automation/SKILL.md
2026-03-13 09:33:53 -05:00

4.5 KiB

name, description
name description
web-automation Browse and scrape web pages using Playwright-compatible CloakBrowser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or scraping websites with bot protection.

Web Automation with CloakBrowser (Codex)

Automated web browsing and scraping using Playwright-compatible CloakBrowser with two execution paths under one skill:

  • one-shot extraction via extract.js
  • broader stateful automation via CloakBrowser and the existing auth.ts, browse.ts, flow.ts, and scrape.ts

When To Use Which Command

  • Use node scripts/extract.js "<URL>" for one-shot extraction from a single URL when you need rendered content, bounded stealth behavior, and JSON output.
  • Use npx tsx scrape.ts ... when you need markdown output, Readability extraction, full-page cleanup, or selector-based scraping.
  • Use npx tsx browse.ts ..., auth.ts, or flow.ts when the task needs interactive navigation, persistent sessions, login handling, click/type actions, or multi-step workflows.

Requirements

  • Node.js 20+
  • pnpm
  • Network access to download the CloakBrowser binary on first use or via preinstall

First-Time Setup

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Updating CloakBrowser

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm up cloakbrowser playwright-core
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Prerequisite Check (MANDATORY)

Before running any automation, verify CloakBrowser and Playwright Core dependencies are installed and scripts are configured to use CloakBrowser.

cd ~/.openclaw/workspace/skills/web-automation/scripts
node --input-type=module -e "await import('cloakbrowser');import 'playwright-core';console.log('OK: cloakbrowser + playwright-core installed')"
node -e "const fs=require('fs');const t=fs.readFileSync('browse.ts','utf8');if(!/import\s*\{[^}]*launchPersistentContext[^}]*\}\s*from\s*['\"]cloakbrowser['\"]/.test(t)){throw new Error('browse.ts is not configured for CloakBrowser')}console.log('OK: CloakBrowser integration detected in browse.ts')"

If any check fails, stop and return:

"Missing dependency/config: web-automation requires cloakbrowser and playwright-core with CloakBrowser-based scripts. Run setup in this skill, then retry."

If runtime fails with missing native bindings for better-sqlite3 or esbuild, run:

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Quick Reference

  • One-shot JSON extract: node scripts/extract.js "https://example.com"
  • Browse page: npx tsx browse.ts --url "https://example.com"
  • Scrape markdown: npx tsx scrape.ts --url "https://example.com" --mode main --output page.md
  • Authenticate: npx tsx auth.ts --url "https://example.com/login"
  • Natural-language flow: npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'

One-shot extraction

Use extract.js when you need a single page fetch with JavaScript rendering and lightweight anti-bot shaping, but not a full automation session.

node scripts/extract.js "https://example.com"
WAIT_TIME=5000 node scripts/extract.js "https://example.com"
SCREENSHOT_PATH=/tmp/page.png SAVE_HTML=true node scripts/extract.js "https://example.com"

Output is JSON only and includes fields such as:

  • requestedUrl
  • finalUrl
  • title
  • content
  • metaDescription
  • status
  • elapsedSeconds
  • challengeDetected
  • optional screenshot
  • optional htmlFile

General flow runner

Use flow.ts for multi-step commands in plain language (go/click/type/press/wait/screenshot).

Example:

npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'

Compatibility Aliases

  • CAMOUFOX_PROFILE_PATH still works as a legacy alias for CLOAKBROWSER_PROFILE_PATH
  • CAMOUFOX_HEADLESS still works as a legacy alias for CLOAKBROWSER_HEADLESS
  • CAMOUFOX_USERNAME and CAMOUFOX_PASSWORD still work as legacy aliases for CLOAKBROWSER_USERNAME and CLOAKBROWSER_PASSWORD

Notes

  • Sessions persist in CloakBrowser profile storage.
  • Use --wait for dynamic pages.
  • Use --mode selector --selector "..." for targeted extraction.
  • extract.js keeps stealth and bounded anti-bot shaping while keeping the browser sandbox enabled.