73 lines
1.8 KiB
Markdown
73 lines
1.8 KiB
Markdown
# playwright-safe
|
|
|
|
Single-entry Playwright scraper for one-shot page extraction with JavaScript rendering and moderate anti-bot handling.
|
|
|
|
## What this skill is for
|
|
|
|
- Extracting title, visible text, and metadata from one URL
|
|
- Pages that need client-side rendering
|
|
- Moderate anti-bot shaping without a full browser automation workflow
|
|
- Structured JSON output that agents can consume directly
|
|
|
|
## What this skill is not for
|
|
|
|
- Multi-step browser workflows
|
|
- Authenticated login flows
|
|
- Interactive click/type sequences across multiple pages
|
|
|
|
Use `web-automation` for those broader browser tasks.
|
|
|
|
## Runtime requirements
|
|
|
|
- Node.js 18+
|
|
- Local Playwright install under the skill directory
|
|
|
|
## First-time setup
|
|
|
|
```bash
|
|
cd ~/.openclaw/workspace/skills/playwright-safe
|
|
npm install
|
|
npx playwright install chromium
|
|
```
|
|
|
|
## Entry point
|
|
|
|
```bash
|
|
node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
```
|
|
|
|
Only pass a user-provided `http` or `https` URL.
|
|
|
|
## Options
|
|
|
|
```bash
|
|
WAIT_TIME=5000 node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
SCREENSHOT_PATH=/tmp/page.png node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
SAVE_HTML=true node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
HEADLESS=false node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
USER_AGENT="Mozilla/5.0 ..." node skills/playwright-safe/scripts/playwright-safe.js "<URL>"
|
|
```
|
|
|
|
## Output
|
|
|
|
The script prints JSON only. It includes:
|
|
|
|
- `requestedUrl`
|
|
- `finalUrl`
|
|
- `title`
|
|
- `content`
|
|
- `metaDescription`
|
|
- `status`
|
|
- `elapsedSeconds`
|
|
- `challengeDetected`
|
|
- optional `screenshot`
|
|
- optional `htmlFile`
|
|
|
|
## Security posture
|
|
|
|
- Keeps lightweight stealth and anti-bot shaping
|
|
- Keeps the browser sandbox enabled
|
|
- Does not use `--no-sandbox`
|
|
- Does not use `--disable-setuid-sandbox`
|
|
- Avoids site-specific extractors and cross-skill dependencies
|