Add Zillow and HAR photo extractors
This commit is contained in:
@@ -48,8 +48,7 @@ Before running any automation, verify CloakBrowser and Playwright Core dependenc
|
||||
|
||||
```bash
|
||||
cd ~/.openclaw/workspace/skills/web-automation/scripts
|
||||
node --input-type=module -e "await import('cloakbrowser');import 'playwright-core';console.log('OK: cloakbrowser + playwright-core installed')"
|
||||
node -e "const fs=require('fs');const t=fs.readFileSync('browse.ts','utf8');if(!/import\s*\{[^}]*launchPersistentContext[^}]*\}\s*from\s*['\"]cloakbrowser['\"]/.test(t)){throw new Error('browse.ts is not configured for CloakBrowser')}console.log('OK: CloakBrowser integration detected in browse.ts')"
|
||||
node check-install.js
|
||||
```
|
||||
|
||||
If any check fails, stop and return:
|
||||
@@ -66,7 +65,10 @@ pnpm rebuild better-sqlite3 esbuild
|
||||
|
||||
## Quick Reference
|
||||
|
||||
- Install check: `node check-install.js`
|
||||
- One-shot JSON extract: `node scripts/extract.js "https://example.com"`
|
||||
- Zillow photo URLs: `node scripts/zillow-photos.js "https://www.zillow.com/homedetails/..."`
|
||||
- HAR photo URLs: `node scripts/har-photos.js "https://www.har.com/homedetail/..."`
|
||||
- Browse page: `npx tsx browse.ts --url "https://example.com"`
|
||||
- Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md`
|
||||
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
|
||||
@@ -93,6 +95,7 @@ Notes:
|
||||
- If `node` lives somewhere else on the host, replace `/opt/homebrew/bin/node` with the output of `which node`.
|
||||
- If matching problems persist, replace `~/.openclaw/...` with the full absolute path such as `/Users/<user>/.openclaw/...`.
|
||||
- Keep the allowlist scoped to the main agent unless there is a real reason to broaden it.
|
||||
- Prefer file-based commands like `node check-install.js` or `node scripts/zillow-photos.js ...` over inline interpreter eval (`node -e`, `node --input-type=module -e`). OpenClaw exec approvals treat inline eval as a higher-friction path.
|
||||
|
||||
## One-shot extraction
|
||||
|
||||
@@ -127,6 +130,29 @@ Example:
|
||||
npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'
|
||||
```
|
||||
|
||||
## Real-estate photo extraction
|
||||
|
||||
Use the dedicated extractors before trying a free-form gallery flow.
|
||||
|
||||
- Zillow: `node scripts/zillow-photos.js "<listing-url>"`
|
||||
- HAR: `node scripts/har-photos.js "<listing-url>"`
|
||||
|
||||
These scripts are purpose-built for the common `See all photos` / `Show all photos` workflow:
|
||||
- open the listing page
|
||||
- click the all-photos entry point
|
||||
- wait for the resulting photo page or scroller view
|
||||
- extract direct image URLs from the rendered page
|
||||
|
||||
Output is JSON with:
|
||||
- `requestedUrl`
|
||||
- `finalUrl`
|
||||
- `clickedLabel`
|
||||
- `photoCount`
|
||||
- `imageUrls`
|
||||
- `notes`
|
||||
|
||||
For property-assessor style workflows, prefer these dedicated commands over generic natural-language gallery automation.
|
||||
|
||||
### Gallery/lightbox and all-photos workflows
|
||||
|
||||
For real-estate listings and other image-heavy pages, prefer the most accessible all-photos view first.
|
||||
|
||||
Reference in New Issue
Block a user