Files

Stefano Fiorini b77134ced5 Use Zillow parcel hints for CAD lookup

2026-03-28 03:55:56 -05:00

9.9 KiB

Raw Blame History

name, description

name	description
web-automation	Browse and scrape web pages using Playwright-compatible CloakBrowser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or scraping websites with bot protection.

Web Automation with CloakBrowser (Codex)

Automated web browsing and scraping using Playwright-compatible CloakBrowser with two execution paths under one skill:

one-shot extraction via extract.js
broader stateful automation via CloakBrowser and the existing auth.ts, browse.ts, flow.ts, and scrape.ts

When To Use Which Command

Use node scripts/extract.js "<URL>" for one-shot extraction from a single URL when you need rendered content, bounded stealth behavior, and JSON output.
Use npx tsx scrape.ts ... when you need markdown output, Readability extraction, full-page cleanup, or selector-based scraping.
Use npx tsx browse.ts ..., auth.ts, or flow.ts when the task needs interactive navigation, persistent sessions, login handling, click/type actions, or multi-step workflows.

Requirements

Node.js 20+
pnpm
Network access to download the CloakBrowser binary on first use or via preinstall

First-Time Setup

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm install
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Updating CloakBrowser

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm up cloakbrowser playwright-core
npx cloakbrowser install
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Prerequisite Check (MANDATORY)

Before running any automation, verify CloakBrowser and Playwright Core dependencies are installed and scripts are configured to use CloakBrowser.

cd ~/.openclaw/workspace/skills/web-automation/scripts
node check-install.js

If any check fails, stop and return:

"Missing dependency/config: web-automation requires cloakbrowser and playwright-core with CloakBrowser-based scripts. Run setup in this skill, then retry."

If runtime fails with missing native bindings for better-sqlite3 or esbuild, run:

cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild

Quick Reference

Install check: node check-install.js
Zillow listing discovery from address: node scripts/zillow-discover.js "4141 Whiteley Dr, Corpus Christi, TX 78418"
HAR listing discovery from address: node scripts/har-discover.js "4141 Whiteley Dr, Corpus Christi, TX 78418"
One-shot JSON extract: node scripts/extract.js "https://example.com"
Zillow photo URLs: node scripts/zillow-photos.js "https://www.zillow.com/homedetails/..."
HAR photo URLs: node scripts/har-photos.js "https://www.har.com/homedetail/..."
Browse page: npx tsx browse.ts --url "https://example.com"
Scrape markdown: npx tsx scrape.ts --url "https://example.com" --mode main --output page.md
Authenticate: npx tsx auth.ts --url "https://example.com/login"
Natural-language flow: npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'

Messaging rule:

For WhatsApp or similar chat-driven runs, prefer native web_search, web_fetch, and bounded browser actions over shelling out to these helper scripts for every core step.
Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks.
If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment.

OpenClaw Exec Approvals / Allowlist

If OpenClaw prompts for exec approval every time this skill runs, add a local approvals allowlist for the main agent before retrying. This is especially helpful for repeated extract.js, browse.ts, and other CloakBrowser-backed scrapes.

openclaw approvals allowlist add --agent main "/opt/homebrew/bin/node"
openclaw approvals allowlist add --agent main "/usr/bin/env"
openclaw approvals allowlist add --agent main "~/.openclaw/workspace/skills/web-automation/scripts/*.js"
openclaw approvals allowlist add --agent main "~/.openclaw/workspace/skills/web-automation/scripts/node_modules/.bin/*"

Then verify:

openclaw approvals get

Notes:

If node lives somewhere else on the host, replace /opt/homebrew/bin/node with the output of which node.
If matching problems persist, replace ~/.openclaw/... with the full absolute path such as /Users/<user>/.openclaw/....
Keep the allowlist scoped to the main agent unless there is a real reason to broaden it.
Prefer file-based commands like node check-install.js or node scripts/zillow-photos.js ... over inline interpreter eval (node -e, node --input-type=module -e). OpenClaw exec approvals treat inline eval as a higher-friction path.

One-shot extraction

Use extract.js when you need a single page fetch with JavaScript rendering and lightweight anti-bot shaping, but not a full automation session.

node scripts/extract.js "https://example.com"
WAIT_TIME=5000 node scripts/extract.js "https://example.com"
SCREENSHOT_PATH=/tmp/page.png SAVE_HTML=true node scripts/extract.js "https://example.com"

Output is JSON only and includes fields such as:

requestedUrl
finalUrl
title
content
metaDescription
status
elapsedSeconds
challengeDetected
optional screenshot
optional htmlFile

General flow runner

Use flow.ts for multi-step commands in plain language (go/click/type/press/wait/screenshot).

Example:

npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'

Real-estate photo extraction

Use the dedicated extractors before trying a free-form gallery flow.

Zillow discovery: node scripts/zillow-discover.js "<street-address>"
HAR discovery: node scripts/har-discover.js "<street-address>"
Zillow: node scripts/zillow-photos.js "<listing-url>"
Zillow identifiers: node scripts/zillow-identifiers.js "<listing-url>"
HAR: node scripts/har-photos.js "<listing-url>"

The discovery scripts are purpose-built for the common address-to-listing workflow:

open the site search or address URL
keep apartment / unit identifiers when the address includes them
resolve or identify a matching listing page when possible
reject a mismatched unit when the requested address includes one
still work normally for single-family / no-unit addresses
return the direct listing URL as JSON
support longer source-specific timeouts when a caller such as property-assessor imports them for slower exact-unit Zillow pages

The photo scripts are purpose-built for the common See all photos / Show all photos workflow:

open the listing page
on Zillow, first inspect the rendered listing shell for a complete structured __NEXT_DATA__ photo set
if the visible page count is missing, trust the structured Zillow photo set when page metadata confirms the count or when the embedded set is already clearly substantial
only force the all-photos click path when the initial Zillow page data is incomplete
wait for the resulting photo page or scroller view when the click path is actually needed
extract direct image URLs from the rendered page
fail fast with a timeout instead of hanging indefinitely when the browser-backed extraction stalls
support longer source-specific timeouts when a caller such as property-assessor imports them for slower exact-unit Zillow renders

Output is JSON with:

requestedUrl
finalUrl
clickedLabel
photoCount
imageUrls
notes

zillow-identifiers.js is a lighter helper for CAD/public-record workflows:

open the Zillow listing shell
inspect embedded __NEXT_DATA__ plus visible listing text
capture parcel/APN-style identifiers when Zillow exposes them
return those hints so property-assessor can use them as stronger CAD lookup keys than listing geo IDs

For property-assessor style workflows, prefer these dedicated commands over generic natural-language gallery automation.

Gallery/lightbox and all-photos workflows

For real-estate listings and other image-heavy pages, prefer the most accessible all-photos view first.

Practical rules:

A scrollable all-photos page, expanded photo grid, or photo list is an acceptable source for condition review if it clearly exposes the listing images.
Do not treat a listing page hero image, gallery collage preview, or modal landing view alone as full photo review.
Only rely on next-arrow / slideshow traversal when the site does not provide an accessible all-photos view.
If using a gallery, confirm the image changed before counting the next screenshot as reviewed.
If a generic Next control exits the gallery or returns to the listing shell, stop and adjust the selector/interaction; do not claim the photos were reviewed.
Blind ArrowRight presses are not reliable enough unless you have already verified that they advance the gallery on that site.
For smaller listings, review all photos when practical; otherwise review enough distinct photos to cover kitchen, baths, living areas, bedrooms, exterior, and any waterfront/balcony/deck elements.
If automation cannot reliably access enough photos, say so explicitly in the final answer.

Where possible, prefer a site’s explicit See all photos / Show all photos path over fragile modal navigation.

Compatibility Aliases

CAMOUFOX_PROFILE_PATH still works as a legacy alias for CLOAKBROWSER_PROFILE_PATH
CAMOUFOX_HEADLESS still works as a legacy alias for CLOAKBROWSER_HEADLESS
CAMOUFOX_USERNAME and CAMOUFOX_PASSWORD still work as legacy aliases for CLOAKBROWSER_USERNAME and CLOAKBROWSER_PASSWORD

Notes

Sessions persist in CloakBrowser profile storage.
Use --wait for dynamic pages.
Use --mode selector --selector "..." for targeted extraction.
extract.js keeps stealth and bounded anti-bot shaping while keeping the browser sandbox enabled.

9.9 KiB Raw Blame History Unescape Escape