Add Zillow and HAR photo extractors

This commit is contained in:
2026-03-27 17:35:46 -05:00
parent e7c56fe760
commit eeea0c8ef1
11 changed files with 873 additions and 8 deletions

View File

@@ -48,8 +48,7 @@ Before running any automation, verify CloakBrowser and Playwright Core dependenc
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
node --input-type=module -e "await import('cloakbrowser');import 'playwright-core';console.log('OK: cloakbrowser + playwright-core installed')"
node -e "const fs=require('fs');const t=fs.readFileSync('browse.ts','utf8');if(!/import\s*\{[^}]*launchPersistentContext[^}]*\}\s*from\s*['\"]cloakbrowser['\"]/.test(t)){throw new Error('browse.ts is not configured for CloakBrowser')}console.log('OK: CloakBrowser integration detected in browse.ts')"
node check-install.js
```
If any check fails, stop and return:
@@ -66,7 +65,10 @@ pnpm rebuild better-sqlite3 esbuild
## Quick Reference
- Install check: `node check-install.js`
- One-shot JSON extract: `node scripts/extract.js "https://example.com"`
- Zillow photo URLs: `node scripts/zillow-photos.js "https://www.zillow.com/homedetails/..."`
- HAR photo URLs: `node scripts/har-photos.js "https://www.har.com/homedetail/..."`
- Browse page: `npx tsx browse.ts --url "https://example.com"`
- Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md`
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
@@ -93,6 +95,7 @@ Notes:
- If `node` lives somewhere else on the host, replace `/opt/homebrew/bin/node` with the output of `which node`.
- If matching problems persist, replace `~/.openclaw/...` with the full absolute path such as `/Users/<user>/.openclaw/...`.
- Keep the allowlist scoped to the main agent unless there is a real reason to broaden it.
- Prefer file-based commands like `node check-install.js` or `node scripts/zillow-photos.js ...` over inline interpreter eval (`node -e`, `node --input-type=module -e`). OpenClaw exec approvals treat inline eval as a higher-friction path.
## One-shot extraction
@@ -127,6 +130,29 @@ Example:
npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'
```
## Real-estate photo extraction
Use the dedicated extractors before trying a free-form gallery flow.
- Zillow: `node scripts/zillow-photos.js "<listing-url>"`
- HAR: `node scripts/har-photos.js "<listing-url>"`
These scripts are purpose-built for the common `See all photos` / `Show all photos` workflow:
- open the listing page
- click the all-photos entry point
- wait for the resulting photo page or scroller view
- extract direct image URLs from the rendered page
Output is JSON with:
- `requestedUrl`
- `finalUrl`
- `clickedLabel`
- `photoCount`
- `imageUrls`
- `notes`
For property-assessor style workflows, prefer these dedicated commands over generic natural-language gallery automation.
### Gallery/lightbox and all-photos workflows
For real-estate listings and other image-heavy pages, prefer the most accessible all-photos view first.