diff --git a/docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md b/docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md new file mode 100644 index 0000000..84faa47 --- /dev/null +++ b/docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md @@ -0,0 +1,180 @@ +# Property Assessor WhatsApp-Safe Runtime Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Keep WhatsApp property-assessor runs moving by failing fast on silent discovery/photo hangs, avoiding helper subprocesses during core analysis, and reserving subprocess use to the final PDF render attempt only. + +**Architecture:** Add small timeout guards around the existing in-process listing discovery and photo extraction calls so one quiet browser-backed task cannot stall the entire assessment. Tighten the live skill and published docs so messaging runs treat chat-native source collection as the default path and helper commands as non-chat or final-render-only tools. + +**Tech Stack:** TypeScript, Node test runner, existing OpenClaw property-assessor skill, existing OpenClaw web-automation modules, Markdown docs. + +--- + +### Task 1: Add failing timeout tests for discovery and photo extraction + +**Files:** +- Modify: `skills/property-assessor/tests/assessment.test.ts` + +**Step 1: Write the failing test** + +Add tests that stub discovery/photo functions with never-resolving promises and assert that: +- listing discovery returns `null` URLs and records timeout attempts +- photo extraction returns `not completed` instead of hanging forever + +**Step 2: Run test to verify it fails** + +Run: `npm test -- --test-name-pattern "times out"` +Expected: FAIL because current code never times out or records timeout attempts. + +**Step 3: Write minimal implementation** + +Implement the smallest timeout wrapper needed for the tests to pass. + +**Step 4: Run test to verify it passes** + +Run: `npm test -- --test-name-pattern "times out"` +Expected: PASS + +**Step 5: Commit** + +```bash +git add skills/property-assessor/tests/assessment.test.ts +git commit -m "test: cover stalled discovery and photo extraction" +``` + +### Task 2: Implement hard timeout guards in the live assessment path + +**Files:** +- Create: `skills/property-assessor/src/async-timeout.ts` +- Modify: `skills/property-assessor/src/listing-discovery.ts` +- Modify: `skills/property-assessor/src/photo-review.ts` + +**Step 1: Write the failing test** + +Use the tests from Task 1 as the red phase. + +**Step 2: Run test to verify it fails** + +Run: `npm test -- --test-name-pattern "times out"` +Expected: FAIL + +**Step 3: Write minimal implementation** + +Add: +- a shared timeout helper for async operations +- timeout-wrapped Zillow/HAR discovery in `listing-discovery.ts` +- timeout-wrapped Zillow/HAR photo extraction in `photo-review.ts` +- clear timeout attempt messages so the assessment can continue honestly + +**Step 4: Run test to verify it passes** + +Run: `npm test -- --test-name-pattern "times out"` +Expected: PASS + +**Step 5: Commit** + +```bash +git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts +git commit -m "fix: fail fast on stalled property-assessor extraction steps" +``` + +### Task 3: Tighten live skill instructions for WhatsApp-safe execution + +**Files:** +- Modify: `../skills/property-assessor/SKILL.md` + +**Step 1: Write the failing test** + +No automated test. Use the documented runtime rule as the spec: +- WhatsApp/messaging runs must avoid helper subprocesses for core analysis +- only the final PDF render attempt may use the helper subprocess path +- `update?` must remain status-only + +**Step 2: Run verification to confirm current docs are wrong** + +Run: `rg -n "scripts/property-assessor assess|node zillow-photos|node har-photos|Good:" ../skills/property-assessor/SKILL.md` +Expected: current doc still presents helper commands as normal chat-safe core workflow. + +**Step 3: Write minimal implementation** + +Update the live skill doc to: +- prefer `web_search`, `web_fetch`, and bounded `web-automation` for core assessment +- forbid `scripts/property-assessor assess`, `node zillow-photos.js`, `node har-photos.js`, and ad hoc `curl` as the default WhatsApp core path +- allow a single final PDF render attempt only after a decision-grade verdict exists + +**Step 4: Run verification** + +Run: `sed -n '1,220p' ../skills/property-assessor/SKILL.md` +Expected: the WhatsApp-safe runtime rules are explicit and unambiguous. + +**Step 5: Commit** + +```bash +git add ../skills/property-assessor/SKILL.md +git commit -m "docs: clarify whatsapp-safe property-assessor execution" +``` + +### Task 4: Mirror the runtime guidance into the published repo docs + +**Files:** +- Modify: `docs/property-assessor.md` +- Modify: `docs/web-automation.md` + +**Step 1: Write the failing test** + +No automated test. The spec is consistency with the live skill instructions. + +**Step 2: Run verification to confirm current docs drift** + +Run: `rg -n "node zillow-photos|node har-photos|assess --address" docs/property-assessor.md docs/web-automation.md` +Expected: current docs still imply subprocess-heavy commands are the standard chat path. + +**Step 3: Write minimal implementation** + +Document: +- chat-native assessment first +- timeout-protected discovery/photo extraction behavior +- final-render-only subprocess attempt from messaging runs + +**Step 4: Run verification** + +Run: `sed -n '1,220p' docs/property-assessor.md && sed -n '1,220p' docs/web-automation.md` +Expected: published docs match the live skill behavior. + +**Step 5: Commit** + +```bash +git add docs/property-assessor.md docs/web-automation.md +git commit -m "docs: document whatsapp-safe property assessment flow" +``` + +### Task 5: Verify the focused runtime behavior + +**Files:** +- Modify: `skills/property-assessor/tests/assessment.test.ts` +- Verify: `skills/property-assessor/src/*.ts` +- Verify: `../skills/property-assessor/SKILL.md` +- Verify: `docs/property-assessor.md` +- Verify: `docs/web-automation.md` + +**Step 1: Run focused tests** + +Run: `npm test` +Expected: all `property-assessor` tests pass, including timeout coverage. + +**Step 2: Run targeted source verification** + +Run: `rg -n "withTimeout|timed out|final PDF render" skills/property-assessor/src ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md` +Expected: timeout guards and the final-render-only messaging rule are present. + +**Step 3: Inspect git status** + +Run: `git status --short` +Expected: only intended files are modified. + +**Step 4: Commit** + +```bash +git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md +git commit -m "fix: make property-assessor safer for whatsapp runs" +``` diff --git a/docs/property-assessor.md b/docs/property-assessor.md index 574b266..1801bf2 100644 --- a/docs/property-assessor.md +++ b/docs/property-assessor.md @@ -67,6 +67,9 @@ Operational rule: - Preliminary helper output should be treated as structured scaffolding for the remaining work, not as a reason to stop and wait for another user nudge. - In chat/messaging runs, do not waste the turn on `npm install` or `npm ci` when the local skill dependencies are already present. - If the user asks `update?` or `and?` mid-run, treat that as a status request and continue the same assessment rather than restarting or stopping at the last checkpoint. +- In WhatsApp or similar messaging runs, keep the core analysis on native `web_search`, `web_fetch`, and bounded browser actions. Do not make helper subprocesses the default path for discovery, CAD lookup, or photo extraction. +- In those messaging runs, reserve subprocess use for a single final `render-report` attempt after the verdict and fair-value range are complete. +- If final PDF render/send fails, return the completed decision-grade report in chat and report delivery failure separately rather than restarting the whole assessment. ### `assess` @@ -84,6 +87,7 @@ Current behavior: - automatically tries to discover Zillow and HAR listing URLs from the address when no listing URL is provided - runs Zillow photo extraction first, then HAR as fallback when available - reuses the OpenClaw web-automation logic in-process instead of spawning nested helper commands +- fails fast when Zillow/HAR discovery or photo extraction stalls instead of hanging indefinitely - returns a structured preliminary report payload - does not require recipient email(s) for the analysis-only run - asks for recipient email(s) only when PDF rendering is explicitly requested @@ -226,7 +230,9 @@ When those extractors return `imageUrls`, that returned set is the photo-review ## Approval-safe command shape -For chat-driven runs, prefer file-based commands. +For local/manual runs, prefer file-based commands. + +For chat-driven messaging runs, prefer native `web_search`, `web_fetch`, and bounded browser actions first. Treat these commands as local/manual helpers or non-chat fallbacks, not as the default core-analysis path. Good: @@ -240,6 +246,13 @@ Good: - `scripts/property-assessor render-report --input ... --output ...` - `web_fetch` to read an official CAD / assessor page when the TypeScript helper needs a plain page fetch +Messaging default: + +- `web_search` to discover listing and public-record URLs +- `web_fetch` for official CAD / assessor pages and accessible listing pages +- bounded `web-automation` actions for rendered listing/photo views +- one final `scripts/property-assessor render-report ...` attempt only after the decision-grade report is complete + Avoid when possible: - `node -e "..."` diff --git a/docs/web-automation.md b/docs/web-automation.md index fd825e8..6cb9f5d 100644 --- a/docs/web-automation.md +++ b/docs/web-automation.md @@ -18,6 +18,11 @@ Automated web browsing and scraping using Playwright-compatible CloakBrowser, wi - Use `node skills/web-automation/scripts/zillow-discover.js ""` or `har-discover.js` to resolve a real-estate listing URL from an address - Use `node skills/web-automation/scripts/zillow-photos.js ""` or `har-photos.js` for real-estate photo extraction before attempting generic gallery automation +Messaging rule: +- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to helper scripts for every core step. +- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks. +- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment. + ## Requirements - Node.js 20+ @@ -148,6 +153,7 @@ What it does: - resolves directly to a property page when Zillow supports the address slug - otherwise looks for a `homedetails` listing link in the rendered page - returns the discovered listing URL as JSON +- fails fast with a timeout if the browser-backed discovery stalls ### HAR discovery @@ -161,6 +167,7 @@ What it does: - looks for a confident `homedetail` match in rendered results - returns the discovered listing URL when HAR exposes a strong enough match - returns `listingUrl: null` when HAR discovery is not confident enough +- fails fast with a timeout if the browser-backed discovery stalls ### Zillow @@ -174,6 +181,7 @@ What it does: - tries the `See all photos` / `See all X photos` entry point - if Zillow keeps the click path flaky, falls back to the listing's embedded `__NEXT_DATA__` payload - returns direct `photos.zillowstatic.com` image URLs as JSON +- fails fast with a timeout if the browser-backed extraction stalls Expected success shape: - `complete: true` @@ -191,6 +199,7 @@ What it does: - opens the HAR listing page - clicks `Show all photos` / `View all photos` - extracts the direct `pics.harstatic.com` image URLs from the all-photos page +- fails fast with a timeout if the browser-backed extraction stalls Expected success shape: - `complete: true` diff --git a/skills/property-assessor/SKILL.md b/skills/property-assessor/SKILL.md index 9fce662..5b02489 100644 --- a/skills/property-assessor/SKILL.md +++ b/skills/property-assessor/SKILL.md @@ -56,10 +56,14 @@ Rules: - Avoid fragile interactive gallery flows if they are likely to require approval or bounce the user into Control UI. - If a richer photo pass would require approval, do not silently force that path first. Continue with the best approval-free workflow available and clearly lower confidence if needed. - Only escalate to approval-heavy browser interaction when there is no reasonable alternative and the extra fidelity materially changes the assessment. +- In WhatsApp or similar messaging runs, keep the core assessment on native chat-safe tools: `web_search`, `web_fetch`, and bounded `web-automation` browsing/extraction. +- In those messaging runs, do **not** make `scripts/property-assessor assess`, `scripts/property-assessor locate-public-records`, `node zillow-discover.js`, `node har-discover.js`, `node zillow-photos.js`, `node har-photos.js`, `curl`, or `wget` the default core-analysis path. +- From messaging runs, the only subprocess-style step you should attempt by default is the final `scripts/property-assessor render-report` call after the verdict, fair-value range, and report body are complete. +- If the final PDF render fails, return the complete decision-grade report in chat and say the render/send step failed. Do not restart the whole assessment. - Do **not** run `npm install`, `npm ci`, or other dependency-setup commands during a normal chat assessment flow when the local skill already has `node_modules` present. - On this machine, treat `property-assessor` dependencies as already installed. Use the helper entrypoints directly unless the wrapper itself explicitly reports missing local dependencies. - Do **not** use ad hoc shell snippets, heredocs, or inline interpreter eval for public-record or CAD lookup from chat. Avoid forms like `python3 - <<'PY'`, `python -c`, `node -e`, `node --input-type=module -e`, or raw `bash -lc '...'` probes. -- For public-record enrichment, use `scripts/property-assessor locate-public-records`, `scripts/property-assessor assess`, or `web_fetch`. If a one-off fetch helper is truly needed, add a file-based helper under the skill tree first and run that file-based entrypoint instead of inline code. +- For public-record enrichment in chat, prefer `web_fetch` plus official assessor/CAD pages. If a one-off helper is truly needed, add a file-based helper under the skill tree first and use it from a non-messaging surface instead of inline code. ## Source order @@ -75,6 +79,8 @@ Use `web_search` sparingly to discover alternate URLs, then return to `web-autom ## Helper runtime +These helper commands are primarily for local/manual runs, non-chat surfaces, and the final PDF render step. They are **not** the default WhatsApp core-analysis path. + `property-assessor` now includes TypeScript helper commands for: - address-first preliminary assessment assembly - public-record jurisdiction lookup @@ -98,7 +104,7 @@ scripts/property-assessor locate-public-records --address "" scripts/property-assessor render-report --input "" --output "" ``` -`assess` is the address-first entrypoint. It should: +`assess` is the address-first entrypoint for helper-driven runs. It should: - require the assessment purpose - treat the assessment purpose as missing unless it is present in the current request or explicitly confirmed as unchanged from earlier context - resolve official public-record jurisdiction automatically from the address @@ -131,18 +137,18 @@ Default approach: Approval-safe rule: - do not perform CAD/public-record discovery with inline shell or interpreter snippets -- use the built-in TypeScript helper path first -- if the official site needs a plain page fetch, prefer `web_fetch` -- if rendered interaction is unavoidable, use a file-based `web-automation` helper rather than ad hoc shell text +- in chat/messaging runs, prefer `web_fetch` plus official CAD/assessor pages first +- use the built-in TypeScript helper path on local/manual surfaces or for the final PDF render step +- if rendered interaction is unavoidable, use bounded `web-automation` behavior rather than ad hoc shell text -Use the helper CLI first: +Use the helper CLI first on local/manual surfaces: ```bash cd ~/.openclaw/workspace/skills/property-assessor scripts/property-assessor locate-public-records --address "" ``` -When you want the helper to assemble the preliminary assessment payload in one step, use: +When you want the helper to assemble the preliminary assessment payload in one step from a non-messaging surface, use: ```bash cd ~/.openclaw/workspace/skills/property-assessor diff --git a/skills/property-assessor/src/async-timeout.ts b/skills/property-assessor/src/async-timeout.ts new file mode 100644 index 0000000..53d4336 --- /dev/null +++ b/skills/property-assessor/src/async-timeout.ts @@ -0,0 +1,37 @@ +export class TimeoutError extends Error { + readonly timeoutMs: number; + + constructor(operationName: string, timeoutMs: number) { + super(`${operationName} timed out after ${timeoutMs}ms`); + this.name = "TimeoutError"; + this.timeoutMs = timeoutMs; + } +} + +export async function withTimeout( + operation: () => Promise, + { + operationName, + timeoutMs + }: { + operationName: string; + timeoutMs: number; + } +): Promise { + let timer: NodeJS.Timeout | undefined; + + try { + return await Promise.race([ + operation(), + new Promise((_, reject) => { + timer = setTimeout(() => { + reject(new TimeoutError(operationName, timeoutMs)); + }, timeoutMs); + }) + ]); + } finally { + if (timer) { + clearTimeout(timer); + } + } +} diff --git a/skills/property-assessor/src/listing-discovery.ts b/skills/property-assessor/src/listing-discovery.ts index ff28c10..0d4acf4 100644 --- a/skills/property-assessor/src/listing-discovery.ts +++ b/skills/property-assessor/src/listing-discovery.ts @@ -1,5 +1,6 @@ import { discoverHarListing } from "../../web-automation/scripts/har-discover.js"; import { discoverZillowListing } from "../../web-automation/scripts/zillow-discover.js"; +import { TimeoutError, withTimeout } from "./async-timeout.js"; export interface ListingDiscoveryResult { attempts: string[]; @@ -7,29 +8,65 @@ export interface ListingDiscoveryResult { harUrl: string | null; } -export async function discoverListingSources(address: string): Promise { +interface ListingDiscoveryDeps { + timeoutMs?: number; + discoverZillowListingFn?: typeof discoverZillowListing; + discoverHarListingFn?: typeof discoverHarListing; +} + +const DEFAULT_DISCOVERY_TIMEOUT_MS = Number( + process.env.PROPERTY_ASSESSOR_DISCOVERY_TIMEOUT_MS || 20_000 +); + +export async function discoverListingSources( + address: string, + deps: ListingDiscoveryDeps = {} +): Promise { const attempts: string[] = []; let zillowUrl: string | null = null; let harUrl: string | null = null; + const timeoutMs = deps.timeoutMs ?? DEFAULT_DISCOVERY_TIMEOUT_MS; + const discoverZillowListingFn = deps.discoverZillowListingFn || discoverZillowListing; + const discoverHarListingFn = deps.discoverHarListingFn || discoverHarListing; try { - const result = await discoverZillowListing(address); + const result = await withTimeout( + () => discoverZillowListingFn(address), + { + operationName: "Zillow discovery", + timeoutMs + } + ); zillowUrl = result.listingUrl; attempts.push(...result.attempts); } catch (error) { - attempts.push( - `Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}` - ); + if (error instanceof TimeoutError) { + attempts.push(`Zillow discovery timed out after ${timeoutMs}ms.`); + } else { + attempts.push( + `Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}` + ); + } } try { - const result = await discoverHarListing(address); + const result = await withTimeout( + () => discoverHarListingFn(address), + { + operationName: "HAR discovery", + timeoutMs + } + ); harUrl = result.listingUrl; attempts.push(...result.attempts); } catch (error) { - attempts.push( - `HAR discovery failed: ${error instanceof Error ? error.message : String(error)}` - ); + if (error instanceof TimeoutError) { + attempts.push(`HAR discovery timed out after ${timeoutMs}ms.`); + } else { + attempts.push( + `HAR discovery failed: ${error instanceof Error ? error.message : String(error)}` + ); + } } return { diff --git a/skills/property-assessor/src/photo-review.ts b/skills/property-assessor/src/photo-review.ts index 04a78ab..38868b5 100644 --- a/skills/property-assessor/src/photo-review.ts +++ b/skills/property-assessor/src/photo-review.ts @@ -1,5 +1,6 @@ import { extractHarPhotos } from "../../web-automation/scripts/har-photos.js"; import { extractZillowPhotos } from "../../web-automation/scripts/zillow-photos.js"; +import { withTimeout } from "./async-timeout.js"; export type PhotoSource = "zillow" | "har"; @@ -19,12 +20,33 @@ export interface PhotoReviewResolution { discoveredListingUrls: Array<{ label: string; url: string }>; } +interface PhotoReviewDeps { + timeoutMs?: number; + extractZillowPhotosFn?: typeof extractZillowPhotos; + extractHarPhotosFn?: typeof extractHarPhotos; +} + +const DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS = Number( + process.env.PROPERTY_ASSESSOR_PHOTO_TIMEOUT_MS || 25_000 +); + export async function extractPhotoData( source: PhotoSource, - url: string + url: string, + deps: PhotoReviewDeps = {} ): Promise { + const timeoutMs = deps.timeoutMs ?? DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS; + const extractZillowPhotosFn = deps.extractZillowPhotosFn || extractZillowPhotos; + const extractHarPhotosFn = deps.extractHarPhotosFn || extractHarPhotos; + if (source === "zillow") { - const payload = await extractZillowPhotos(url); + const payload = await withTimeout( + () => extractZillowPhotosFn(url), + { + operationName: "Zillow photo extraction", + timeoutMs + } + ); return { source, requestedUrl: String(payload.requestedUrl || url), @@ -37,7 +59,13 @@ export async function extractPhotoData( }; } - const payload = await extractHarPhotos(url); + const payload = await withTimeout( + () => extractHarPhotosFn(url), + { + operationName: "HAR photo extraction", + timeoutMs + } + ); return { source, requestedUrl: String(payload.requestedUrl || url), diff --git a/skills/property-assessor/tests/timeout-guards.test.ts b/skills/property-assessor/tests/timeout-guards.test.ts new file mode 100644 index 0000000..a84a7aa --- /dev/null +++ b/skills/property-assessor/tests/timeout-guards.test.ts @@ -0,0 +1,32 @@ +import test from "node:test"; +import assert from "node:assert/strict"; + +import { discoverListingSources } from "../src/listing-discovery.js"; +import { extractPhotoData } from "../src/photo-review.js"; + +test("discoverListingSources times out stalled Zillow and HAR discovery calls", async () => { + const result = await discoverListingSources( + "1011 Ennis Joslin Rd APT 235, Corpus Christi, TX 78412", + { + timeoutMs: 20, + discoverZillowListingFn: async () => await new Promise(() => {}), + discoverHarListingFn: async () => await new Promise(() => {}) + } + ); + + assert.equal(result.zillowUrl, null); + assert.equal(result.harUrl, null); + assert.match(result.attempts.join(" "), /zillow discovery timed out/i); + assert.match(result.attempts.join(" "), /har discovery timed out/i); +}); + +test("extractPhotoData times out a stalled photo extraction instead of hanging forever", async () => { + await assert.rejects( + async () => + extractPhotoData("zillow", "https://www.zillow.com/example", { + timeoutMs: 20, + extractZillowPhotosFn: async () => await new Promise(() => {}) + }), + /timed out/i + ); +}); diff --git a/skills/web-automation/SKILL.md b/skills/web-automation/SKILL.md index 3b09557..9a3f12e 100644 --- a/skills/web-automation/SKILL.md +++ b/skills/web-automation/SKILL.md @@ -76,6 +76,11 @@ pnpm rebuild better-sqlite3 esbuild - Authenticate: `npx tsx auth.ts --url "https://example.com/login"` - Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'` +Messaging rule: +- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to these helper scripts for every core step. +- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks. +- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment. + ## OpenClaw Exec Approvals / Allowlist If OpenClaw prompts for exec approval every time this skill runs, add a local approvals allowlist for the main agent before retrying. This is especially helpful for repeated `extract.js`, `browse.ts`, and other CloakBrowser-backed scrapes. @@ -154,6 +159,7 @@ The photo scripts are purpose-built for the common `See all photos` / `Show all - click the all-photos entry point - wait for the resulting photo page or scroller view - extract direct image URLs from the rendered page +- fail fast with a timeout instead of hanging indefinitely when the browser-backed extraction stalls Output is JSON with: - `requestedUrl` diff --git a/skills/web-automation/scripts/har-discover.js b/skills/web-automation/scripts/har-discover.js index 1f331da..f5014cf 100644 --- a/skills/web-automation/scripts/har-discover.js +++ b/skills/web-automation/scripts/har-discover.js @@ -7,6 +7,7 @@ import { dismissCommonOverlays, fail, gotoListing, + runWithOperationTimeout, sleep, } from "./real-estate-photo-common.js"; import { parseAddressIdentity, scoreAddressCandidate } from "./real-estate-address.js"; @@ -64,63 +65,69 @@ export async function discoverHarListing(rawAddress) { const identity = parseAddressIdentity(address); const searchUrl = buildSearchUrl(address); const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" }); + const closeContext = async () => { + await context.close().catch(() => {}); + }; try { - const attempts = [`Opened HAR search URL: ${searchUrl}`]; - await gotoListing(page, searchUrl, 2500); - await dismissCommonOverlays(page); - await sleep(1500); + return await runWithOperationTimeout( + "HAR discovery", + async () => { + const attempts = [`Opened HAR search URL: ${searchUrl}`]; + await gotoListing(page, searchUrl, 2500); + await dismissCommonOverlays(page); + await sleep(1500); - let listingUrl = null; - if (page.url().includes("/homedetail/")) { - const directScore = scoreAddressCandidate( - identity, - `${page.url()} ${(await page.title()) || ""}` - ); - if (directScore.matched) { - listingUrl = normalizeListingUrl(page.url()); - attempts.push("HAR search URL resolved directly to a matching property page."); - } else { - attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough."); - } - } else { - const discovered = await collectListingUrl(page); - const scored = discovered - .map((candidate) => { - const match = scoreAddressCandidate( + let listingUrl = null; + if (page.url().includes("/homedetail/")) { + const directScore = scoreAddressCandidate( identity, - `${candidate.url} ${candidate.text} ${candidate.parentText}` + `${page.url()} ${(await page.title()) || ""}` ); - return { ...candidate, match }; - }) - .sort((a, b) => b.match.score - a.match.score); + if (directScore.matched) { + listingUrl = normalizeListingUrl(page.url()); + attempts.push("HAR search URL resolved directly to a matching property page."); + } else { + attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough."); + } + } else { + const discovered = await collectListingUrl(page); + const scored = discovered + .map((candidate) => { + const match = scoreAddressCandidate( + identity, + `${candidate.url} ${candidate.text} ${candidate.parentText}` + ); + return { ...candidate, match }; + }) + .sort((a, b) => b.match.score - a.match.score); - if (scored[0]?.match.matched) { - listingUrl = normalizeListingUrl(scored[0].url); - attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`); - } else { - attempts.push("HAR discovery did not expose a confident homedetail match for this address."); + if (scored[0]?.match.matched) { + listingUrl = normalizeListingUrl(scored[0].url); + attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`); + } else { + attempts.push("HAR discovery did not expose a confident homedetail match for this address."); + } + } + + return { + source: "har", + address, + searchUrl, + finalUrl: page.url(), + title: await page.title(), + listingUrl, + attempts, + }; + }, + { + onTimeout: closeContext } - } - - const result = { - source: "har", - address, - searchUrl, - finalUrl: page.url(), - title: await page.title(), - listingUrl, - attempts, - }; - await context.close(); - return result; + ); } catch (error) { - try { - await context.close(); - } catch { - // Ignore close errors after the primary failure. - } throw new Error(`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`); + } finally { + await closeContext(); } } diff --git a/skills/web-automation/scripts/har-photos.js b/skills/web-automation/scripts/har-photos.js index 3ace1ca..075e021 100644 --- a/skills/web-automation/scripts/har-photos.js +++ b/skills/web-automation/scripts/har-photos.js @@ -11,6 +11,7 @@ import { gotoListing, normalizeImageCandidates, parseTarget, + runWithOperationTimeout, scrollUntilSettled, sleep, waitForPhotoExperience, @@ -34,51 +35,56 @@ async function getAnnouncedPhotoCount(page) { export async function extractHarPhotos(rawUrl) { const requestedUrl = parseTarget(rawUrl); const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" }); + const closeContext = async () => { + await context.close().catch(() => {}); + }; try { - await gotoListing(page, requestedUrl); - await dismissCommonOverlays(page); + return await runWithOperationTimeout( + "HAR photo extraction", + async () => { + await gotoListing(page, requestedUrl); + await dismissCommonOverlays(page); - const expectedPhotoCount = await getAnnouncedPhotoCount(page); - const beforeUrl = page.url(); - const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS); - await waitForPhotoExperience(page, beforeUrl); - await scrollUntilSettled(page); - await sleep(1200); + const expectedPhotoCount = await getAnnouncedPhotoCount(page); + const beforeUrl = page.url(); + const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS); + await waitForPhotoExperience(page, beforeUrl); + await scrollUntilSettled(page); + await sleep(1200); - const candidates = await collectRenderedImageCandidates(page); - const photos = normalizeImageCandidates(candidates, { - hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"], - minWidth: 240, - minHeight: 180, - }); + const candidates = await collectRenderedImageCandidates(page); + const photos = normalizeImageCandidates(candidates, { + hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"], + minWidth: 240, + minHeight: 180, + }); - if (!photos.length) { - fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view."); - } + if (!photos.length) { + fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view."); + } - const result = { - source: "har", - requestedUrl, - finalUrl: page.url(), - title: await page.title(), - clickedLabel, - expectedPhotoCount, - complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true, - photoCount: photos.length, - imageUrls: photos.map((photo) => photo.url), - notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."], - }; - - await context.close(); - return result; + return { + source: "har", + requestedUrl, + finalUrl: page.url(), + title: await page.title(), + clickedLabel, + expectedPhotoCount, + complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true, + photoCount: photos.length, + imageUrls: photos.map((photo) => photo.url), + notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."], + }; + }, + { + onTimeout: closeContext + } + ); } catch (error) { - try { - await context.close(); - } catch { - // Ignore close errors after the primary failure. - } throw new Error(error instanceof Error ? error.message : String(error)); + } finally { + await closeContext(); } } diff --git a/skills/web-automation/scripts/real-estate-photo-common.js b/skills/web-automation/scripts/real-estate-photo-common.js index 912a1e5..b61916e 100644 --- a/skills/web-automation/scripts/real-estate-photo-common.js +++ b/skills/web-automation/scripts/real-estate-photo-common.js @@ -7,6 +7,7 @@ const MAX_SCROLL_PASSES = 12; const SCROLL_PAUSE_MS = 900; const LARGE_IMAGE_MIN_WIDTH = 300; const LARGE_IMAGE_MIN_HEIGHT = 200; +const OPERATION_TIMEOUT_MS = Number(process.env.REAL_ESTATE_OPERATION_TIMEOUT_MS || 25000); export function fail(message, details) { const payload = { error: message }; @@ -38,6 +39,34 @@ export function sleep(ms) { return new Promise((resolve) => setTimeout(resolve, ms)); } +export async function runWithOperationTimeout( + operationName, + operation, + { timeoutMs = OPERATION_TIMEOUT_MS, onTimeout } = {} +) { + let timer; + + try { + return await Promise.race([ + operation(), + new Promise((_, reject) => { + timer = setTimeout(async () => { + try { + await onTimeout?.(); + } catch { + // Ignore cleanup errors; the timeout is the primary failure. + } + reject(new Error(`${operationName} timed out after ${timeoutMs}ms`)); + }, timeoutMs); + }), + ]); + } finally { + if (timer) { + clearTimeout(timer); + } + } +} + export async function loadCloakBrowser() { try { return await import("cloakbrowser"); @@ -289,4 +318,3 @@ export function buildResult({ notes, }; } - diff --git a/skills/web-automation/scripts/real-estate-photo-common.test.mjs b/skills/web-automation/scripts/real-estate-photo-common.test.mjs index 7182c97..033afc3 100644 --- a/skills/web-automation/scripts/real-estate-photo-common.test.mjs +++ b/skills/web-automation/scripts/real-estate-photo-common.test.mjs @@ -1,7 +1,7 @@ import test from "node:test"; import assert from "node:assert/strict"; -import { normalizeImageCandidates } from "./real-estate-photo-common.js"; +import { normalizeImageCandidates, runWithOperationTimeout } from "./real-estate-photo-common.js"; test("normalizeImageCandidates keeps distinct Zillow photo URLs and strips query strings", () => { const result = normalizeImageCandidates( @@ -64,3 +64,24 @@ test("normalizeImageCandidates filters tiny HAR page assets and keeps large phot "https://photos.har.com/123/main.jpg", ]); }); + +test("runWithOperationTimeout rejects stalled work and runs timeout cleanup", async () => { + let cleanedUp = false; + + await assert.rejects( + async () => + runWithOperationTimeout( + "HAR photo extraction", + async () => await new Promise(() => {}), + { + timeoutMs: 20, + onTimeout: async () => { + cleanedUp = true; + }, + } + ), + /timed out/i + ); + + assert.equal(cleanedUp, true); +}); diff --git a/skills/web-automation/scripts/zillow-discover.js b/skills/web-automation/scripts/zillow-discover.js index 1311a8e..eb208d9 100644 --- a/skills/web-automation/scripts/zillow-discover.js +++ b/skills/web-automation/scripts/zillow-discover.js @@ -7,6 +7,7 @@ import { dismissCommonOverlays, fail, gotoListing, + runWithOperationTimeout, sleep, } from "./real-estate-photo-common.js"; import { @@ -68,63 +69,69 @@ export async function discoverZillowListing(rawAddress) { const identity = parseAddressIdentity(address); const searchUrl = `https://www.zillow.com/homes/${encodeURIComponent(buildZillowAddressSlug(address))}_rb/`; const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" }); + const closeContext = async () => { + await context.close().catch(() => {}); + }; try { - const attempts = [`Opened Zillow address search URL: ${searchUrl}`]; - await gotoListing(page, searchUrl, 2500); - await dismissCommonOverlays(page); - await sleep(1500); + return await runWithOperationTimeout( + "Zillow discovery", + async () => { + const attempts = [`Opened Zillow address search URL: ${searchUrl}`]; + await gotoListing(page, searchUrl, 2500); + await dismissCommonOverlays(page); + await sleep(1500); - let listingUrl = null; - if (page.url().includes("/homedetails/")) { - const directScore = scoreAddressCandidate( - identity, - `${page.url()} ${(await page.title()) || ""}` - ); - if (directScore.matched) { - listingUrl = normalizeListingUrl(page.url()); - attempts.push("Zillow search URL resolved directly to a matching property page."); - } else { - attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough."); - } - } else { - const discovered = await collectListingUrl(page); - const scored = discovered - .map((candidate) => ({ - ...candidate, - match: scoreAddressCandidate( + let listingUrl = null; + if (page.url().includes("/homedetails/")) { + const directScore = scoreAddressCandidate( identity, - `${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}` - ) - })) - .sort((a, b) => b.match.score - a.match.score); + `${page.url()} ${(await page.title()) || ""}` + ); + if (directScore.matched) { + listingUrl = normalizeListingUrl(page.url()); + attempts.push("Zillow search URL resolved directly to a matching property page."); + } else { + attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough."); + } + } else { + const discovered = await collectListingUrl(page); + const scored = discovered + .map((candidate) => ({ + ...candidate, + match: scoreAddressCandidate( + identity, + `${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}` + ) + })) + .sort((a, b) => b.match.score - a.match.score); - if (scored[0]?.match.matched) { - listingUrl = normalizeListingUrl(scored[0].url); - attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`); - } else { - attempts.push("Zillow discovery did not expose a confident homedetails match for this address."); + if (scored[0]?.match.matched) { + listingUrl = normalizeListingUrl(scored[0].url); + attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`); + } else { + attempts.push("Zillow discovery did not expose a confident homedetails match for this address."); + } + } + + return { + source: "zillow", + address, + searchUrl, + finalUrl: page.url(), + title: await page.title(), + listingUrl, + attempts, + }; + }, + { + onTimeout: closeContext } - } - - const result = { - source: "zillow", - address, - searchUrl, - finalUrl: page.url(), - title: await page.title(), - listingUrl, - attempts, - }; - await context.close(); - return result; + ); } catch (error) { - try { - await context.close(); - } catch { - // Ignore close errors after the primary failure. - } throw new Error(`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`); + } finally { + await closeContext(); } } diff --git a/skills/web-automation/scripts/zillow-photos.js b/skills/web-automation/scripts/zillow-photos.js index b1d42bf..350987e 100644 --- a/skills/web-automation/scripts/zillow-photos.js +++ b/skills/web-automation/scripts/zillow-photos.js @@ -10,6 +10,7 @@ import { gotoListing, normalizeImageCandidates, parseTarget, + runWithOperationTimeout, scrollUntilSettled, sleep, waitForPhotoExperience, @@ -107,77 +108,82 @@ async function collectZillowStructuredPhotoCandidates(page) { export async function extractZillowPhotos(rawUrl) { const requestedUrl = parseTarget(rawUrl); const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" }); + const closeContext = async () => { + await context.close().catch(() => {}); + }; try { - await gotoListing(page, requestedUrl); - await dismissCommonOverlays(page); + return await runWithOperationTimeout( + "Zillow photo extraction", + async () => { + await gotoListing(page, requestedUrl); + await dismissCommonOverlays(page); - const expectedPhotoCount = await getAnnouncedPhotoCount(page); - const beforeUrl = page.url(); - let clickedLabel = null; - let clickError = null; + const expectedPhotoCount = await getAnnouncedPhotoCount(page); + const beforeUrl = page.url(); + let clickedLabel = null; + let clickError = null; - try { - clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS); - await waitForPhotoExperience(page, beforeUrl); - await scrollUntilSettled(page); - await sleep(1200); - } catch (error) { - clickError = error instanceof Error ? error.message : String(error); - } + try { + clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS); + await waitForPhotoExperience(page, beforeUrl); + await scrollUntilSettled(page); + await sleep(1200); + } catch (error) { + clickError = error instanceof Error ? error.message : String(error); + } - const [structuredCandidates, renderedCandidates] = await Promise.all([ - collectZillowStructuredPhotoCandidates(page), - collectZillowPhotoCandidates(page), - ]); - const candidates = [...structuredCandidates, ...renderedCandidates]; - const normalized = normalizeImageCandidates(candidates, { - hostIncludes: ["photos.zillowstatic.com"], - minWidth: 240, - minHeight: 180, - }); - const photos = collapseZillowPhotos(normalized); + const [structuredCandidates, renderedCandidates] = await Promise.all([ + collectZillowStructuredPhotoCandidates(page), + collectZillowPhotoCandidates(page), + ]); + const candidates = [...structuredCandidates, ...renderedCandidates]; + const normalized = normalizeImageCandidates(candidates, { + hostIncludes: ["photos.zillowstatic.com"], + minWidth: 240, + minHeight: 180, + }); + const photos = collapseZillowPhotos(normalized); - if (!photos.length) { - fail( - "Zillow photo extraction failed.", - clickError || "No Zillow image URLs were found on the rendered listing page." - ); - } + if (!photos.length) { + fail( + "Zillow photo extraction failed.", + clickError || "No Zillow image URLs were found on the rendered listing page." + ); + } - const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true; - const notes = []; - if (clickedLabel) { - notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs."); - } else { - notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path."); - } - if (clickError) { - notes.push(`All-photos click path was not required: ${clickError}`); - } + const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true; + const notes = []; + if (clickedLabel) { + notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs."); + } else { + notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path."); + } + if (clickError) { + notes.push(`All-photos click path was not required: ${clickError}`); + } - const result = { - source: "zillow", - requestedUrl, - finalUrl: page.url(), - title: await page.title(), - clickedLabel, - expectedPhotoCount, - complete, - photoCount: photos.length, - imageUrls: photos.map((photo) => photo.url), - notes, - }; - - await context.close(); - return result; + return { + source: "zillow", + requestedUrl, + finalUrl: page.url(), + title: await page.title(), + clickedLabel, + expectedPhotoCount, + complete, + photoCount: photos.length, + imageUrls: photos.map((photo) => photo.url), + notes, + }; + }, + { + onTimeout: closeContext + } + ); } catch (error) { - try { - await context.close(); - } catch { - // Ignore close errors after the primary failure. - } throw new Error(error instanceof Error ? error.message : String(error)); + } finally { + await closeContext(); } }