Refresh property assessor and web automation docs

This commit is contained in:
2026-03-27 21:35:55 -05:00
parent eeea0c8ef1
commit 19adb919fc
4 changed files with 163 additions and 2 deletions

View File

@@ -15,6 +15,7 @@ Automated web browsing and scraping using Playwright-compatible CloakBrowser, wi
- Use `node skills/web-automation/scripts/extract.js "<URL>"` for one-shot extraction from a single URL
- Use `npx tsx scrape.ts ...` for markdown scraping modes
- Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` for interactive or authenticated flows
- Use `node skills/web-automation/scripts/zillow-photos.js "<listing-url>"` or `har-photos.js` for real-estate photo extraction before attempting generic gallery automation
## Requirements
@@ -59,6 +60,17 @@ pnpm rebuild better-sqlite3 esbuild
Without this, helper scripts may fail before launch because the native bindings are missing.
## Prerequisite check
Before running automation, verify the local install and CloakBrowser wiring:
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
node check-install.js
```
If this fails, stop and fix setup before troubleshooting site automation.
## Exec approvals allowlist
If OpenClaw keeps prompting for approval when running this skill, add a local allowlist for the main agent:
@@ -80,13 +92,24 @@ Notes:
- If `node` lives somewhere else, replace `/opt/homebrew/bin/node` with the output of `which node`.
- If matching is inconsistent, replace `~/.openclaw/...` with the full absolute path for the machine.
- Keep the allowlist scoped to the main agent unless there is a clear reason to widen it.
- Prefer file-based commands like `node check-install.js`, `node zillow-photos.js ...`, and `node har-photos.js ...` over inline `node -e ...`. Inline interpreter eval is more likely to trigger approval friction.
## Common commands
```bash
# Install / wiring check
cd ~/.openclaw/workspace/skills/web-automation/scripts
node check-install.js
# One-shot JSON extraction
node skills/web-automation/scripts/extract.js "https://example.com"
# Zillow photo extraction
node skills/web-automation/scripts/zillow-photos.js "https://www.zillow.com/homedetails/..."
# HAR photo extraction
node skills/web-automation/scripts/har-photos.js "https://www.har.com/homedetail/..."
# Browse a page with persistent profile
npx tsx browse.ts --url "https://example.com"
@@ -100,6 +123,58 @@ npx tsx auth.ts --url "https://example.com/login"
npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'
```
## Real-estate photo extraction
Use the dedicated Zillow and HAR extractors before trying a free-form gallery flow.
### Zillow
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
node zillow-photos.js "https://www.zillow.com/homedetails/4141-Whiteley-Dr-Corpus-Christi-TX-78418/2103723704_zpid/"
```
What it does:
- opens the listing page with CloakBrowser
- tries the `See all photos` / `See all X photos` entry point
- if Zillow keeps the click path flaky, falls back to the listing's embedded `__NEXT_DATA__` payload
- returns direct `photos.zillowstatic.com` image URLs as JSON
Expected success shape:
- `complete: true`
- `expectedPhotoCount` matches `photoCount`
- `imageUrls` contains the listing photo set
### HAR
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
node har-photos.js "https://www.har.com/homedetail/4141-whiteley-dr-corpus-christi-tx-78418/14069438"
```
What it does:
- opens the HAR listing page
- clicks `Show all photos` / `View all photos`
- extracts the direct `pics.harstatic.com` image URLs from the all-photos page
Expected success shape:
- `complete: true`
- `expectedPhotoCount` matches `photoCount`
- `imageUrls` contains the listing photo set
### Test commands
From `skills/web-automation/scripts`:
```bash
node check-install.js
npm run test:photos
node zillow-photos.js "<zillow-listing-url>"
node har-photos.js "<har-listing-url>"
```
Use the live Zillow and HAR URLs above for a known-good regression check.
## One-shot extraction (`extract.js`)
Use `extract.js` when the task is just: open one URL, render it, and return structured content.