feat: add safe Playwright scraper skill

This commit is contained in:
Stefano Fiorini
2026-03-10 19:07:30 -05:00
parent 60363f9f0c
commit 4b505e4421
10 changed files with 430 additions and 0 deletions

View File

@@ -0,0 +1,68 @@
---
name: playwright-safe
description: Use when a page needs JavaScript rendering or moderate anti-bot handling and the agent should use a single local Playwright scraper instead of generic web fetch tooling.
---
# Playwright Safe
Single-entry Playwright scraper for dynamic or moderately bot-protected pages.
## When To Use
- Page content depends on client-side rendering
- Generic `scrape` or `webfetch` is likely to miss rendered content
- The task needs one direct page extraction with lightweight stealth behavior
## Do Not Use
- For multi-step browser workflows with login/stateful interaction
- For site-specific automation flows
- When the page can be handled by a simpler built-in fetch path
## Setup
```bash
cd ~/.openclaw/workspace/skills/playwright-safe
npm install
npx playwright install chromium
```
## Command
```bash
node scripts/playwright-safe.js "<URL>"
```
Only pass a user-provided `http` or `https` URL.
## Options
```bash
WAIT_TIME=5000 node scripts/playwright-safe.js "<URL>"
SCREENSHOT_PATH=/tmp/page.png node scripts/playwright-safe.js "<URL>"
SAVE_HTML=true node scripts/playwright-safe.js "<URL>"
HEADLESS=false node scripts/playwright-safe.js "<URL>"
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-safe.js "<URL>"
```
## Output
The script prints JSON only, suitable for direct agent consumption. Fields include:
- `requestedUrl`
- `finalUrl`
- `title`
- `content`
- `metaDescription`
- `status`
- `elapsedSeconds`
- `challengeDetected`
- optional `screenshot`
- optional `htmlFile`
## Safety Notes
- Stealth and anti-bot shaping are retained
- Chromium sandbox remains enabled
- No sandbox-disabling flags are used
- No site-specific extractors or foreign tool dependencies are used