fix: make property-assessor safer for whatsapp runs

This commit is contained in:
2026-03-28 01:28:59 -05:00
parent 2deeb31369
commit 3d7ce7617c
15 changed files with 640 additions and 217 deletions

View File

@@ -0,0 +1,180 @@
# Property Assessor WhatsApp-Safe Runtime Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Keep WhatsApp property-assessor runs moving by failing fast on silent discovery/photo hangs, avoiding helper subprocesses during core analysis, and reserving subprocess use to the final PDF render attempt only.
**Architecture:** Add small timeout guards around the existing in-process listing discovery and photo extraction calls so one quiet browser-backed task cannot stall the entire assessment. Tighten the live skill and published docs so messaging runs treat chat-native source collection as the default path and helper commands as non-chat or final-render-only tools.
**Tech Stack:** TypeScript, Node test runner, existing OpenClaw property-assessor skill, existing OpenClaw web-automation modules, Markdown docs.
---
### Task 1: Add failing timeout tests for discovery and photo extraction
**Files:**
- Modify: `skills/property-assessor/tests/assessment.test.ts`
**Step 1: Write the failing test**
Add tests that stub discovery/photo functions with never-resolving promises and assert that:
- listing discovery returns `null` URLs and records timeout attempts
- photo extraction returns `not completed` instead of hanging forever
**Step 2: Run test to verify it fails**
Run: `npm test -- --test-name-pattern "times out"`
Expected: FAIL because current code never times out or records timeout attempts.
**Step 3: Write minimal implementation**
Implement the smallest timeout wrapper needed for the tests to pass.
**Step 4: Run test to verify it passes**
Run: `npm test -- --test-name-pattern "times out"`
Expected: PASS
**Step 5: Commit**
```bash
git add skills/property-assessor/tests/assessment.test.ts
git commit -m "test: cover stalled discovery and photo extraction"
```
### Task 2: Implement hard timeout guards in the live assessment path
**Files:**
- Create: `skills/property-assessor/src/async-timeout.ts`
- Modify: `skills/property-assessor/src/listing-discovery.ts`
- Modify: `skills/property-assessor/src/photo-review.ts`
**Step 1: Write the failing test**
Use the tests from Task 1 as the red phase.
**Step 2: Run test to verify it fails**
Run: `npm test -- --test-name-pattern "times out"`
Expected: FAIL
**Step 3: Write minimal implementation**
Add:
- a shared timeout helper for async operations
- timeout-wrapped Zillow/HAR discovery in `listing-discovery.ts`
- timeout-wrapped Zillow/HAR photo extraction in `photo-review.ts`
- clear timeout attempt messages so the assessment can continue honestly
**Step 4: Run test to verify it passes**
Run: `npm test -- --test-name-pattern "times out"`
Expected: PASS
**Step 5: Commit**
```bash
git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts
git commit -m "fix: fail fast on stalled property-assessor extraction steps"
```
### Task 3: Tighten live skill instructions for WhatsApp-safe execution
**Files:**
- Modify: `../skills/property-assessor/SKILL.md`
**Step 1: Write the failing test**
No automated test. Use the documented runtime rule as the spec:
- WhatsApp/messaging runs must avoid helper subprocesses for core analysis
- only the final PDF render attempt may use the helper subprocess path
- `update?` must remain status-only
**Step 2: Run verification to confirm current docs are wrong**
Run: `rg -n "scripts/property-assessor assess|node zillow-photos|node har-photos|Good:" ../skills/property-assessor/SKILL.md`
Expected: current doc still presents helper commands as normal chat-safe core workflow.
**Step 3: Write minimal implementation**
Update the live skill doc to:
- prefer `web_search`, `web_fetch`, and bounded `web-automation` for core assessment
- forbid `scripts/property-assessor assess`, `node zillow-photos.js`, `node har-photos.js`, and ad hoc `curl` as the default WhatsApp core path
- allow a single final PDF render attempt only after a decision-grade verdict exists
**Step 4: Run verification**
Run: `sed -n '1,220p' ../skills/property-assessor/SKILL.md`
Expected: the WhatsApp-safe runtime rules are explicit and unambiguous.
**Step 5: Commit**
```bash
git add ../skills/property-assessor/SKILL.md
git commit -m "docs: clarify whatsapp-safe property-assessor execution"
```
### Task 4: Mirror the runtime guidance into the published repo docs
**Files:**
- Modify: `docs/property-assessor.md`
- Modify: `docs/web-automation.md`
**Step 1: Write the failing test**
No automated test. The spec is consistency with the live skill instructions.
**Step 2: Run verification to confirm current docs drift**
Run: `rg -n "node zillow-photos|node har-photos|assess --address" docs/property-assessor.md docs/web-automation.md`
Expected: current docs still imply subprocess-heavy commands are the standard chat path.
**Step 3: Write minimal implementation**
Document:
- chat-native assessment first
- timeout-protected discovery/photo extraction behavior
- final-render-only subprocess attempt from messaging runs
**Step 4: Run verification**
Run: `sed -n '1,220p' docs/property-assessor.md && sed -n '1,220p' docs/web-automation.md`
Expected: published docs match the live skill behavior.
**Step 5: Commit**
```bash
git add docs/property-assessor.md docs/web-automation.md
git commit -m "docs: document whatsapp-safe property assessment flow"
```
### Task 5: Verify the focused runtime behavior
**Files:**
- Modify: `skills/property-assessor/tests/assessment.test.ts`
- Verify: `skills/property-assessor/src/*.ts`
- Verify: `../skills/property-assessor/SKILL.md`
- Verify: `docs/property-assessor.md`
- Verify: `docs/web-automation.md`
**Step 1: Run focused tests**
Run: `npm test`
Expected: all `property-assessor` tests pass, including timeout coverage.
**Step 2: Run targeted source verification**
Run: `rg -n "withTimeout|timed out|final PDF render" skills/property-assessor/src ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md`
Expected: timeout guards and the final-render-only messaging rule are present.
**Step 3: Inspect git status**
Run: `git status --short`
Expected: only intended files are modified.
**Step 4: Commit**
```bash
git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md
git commit -m "fix: make property-assessor safer for whatsapp runs"
```

View File

@@ -67,6 +67,9 @@ Operational rule:
- Preliminary helper output should be treated as structured scaffolding for the remaining work, not as a reason to stop and wait for another user nudge.
- In chat/messaging runs, do not waste the turn on `npm install` or `npm ci` when the local skill dependencies are already present.
- If the user asks `update?` or `and?` mid-run, treat that as a status request and continue the same assessment rather than restarting or stopping at the last checkpoint.
- In WhatsApp or similar messaging runs, keep the core analysis on native `web_search`, `web_fetch`, and bounded browser actions. Do not make helper subprocesses the default path for discovery, CAD lookup, or photo extraction.
- In those messaging runs, reserve subprocess use for a single final `render-report` attempt after the verdict and fair-value range are complete.
- If final PDF render/send fails, return the completed decision-grade report in chat and report delivery failure separately rather than restarting the whole assessment.
### `assess`
@@ -84,6 +87,7 @@ Current behavior:
- automatically tries to discover Zillow and HAR listing URLs from the address when no listing URL is provided
- runs Zillow photo extraction first, then HAR as fallback when available
- reuses the OpenClaw web-automation logic in-process instead of spawning nested helper commands
- fails fast when Zillow/HAR discovery or photo extraction stalls instead of hanging indefinitely
- returns a structured preliminary report payload
- does not require recipient email(s) for the analysis-only run
- asks for recipient email(s) only when PDF rendering is explicitly requested
@@ -226,7 +230,9 @@ When those extractors return `imageUrls`, that returned set is the photo-review
## Approval-safe command shape
For chat-driven runs, prefer file-based commands.
For local/manual runs, prefer file-based commands.
For chat-driven messaging runs, prefer native `web_search`, `web_fetch`, and bounded browser actions first. Treat these commands as local/manual helpers or non-chat fallbacks, not as the default core-analysis path.
Good:
@@ -240,6 +246,13 @@ Good:
- `scripts/property-assessor render-report --input ... --output ...`
- `web_fetch` to read an official CAD / assessor page when the TypeScript helper needs a plain page fetch
Messaging default:
- `web_search` to discover listing and public-record URLs
- `web_fetch` for official CAD / assessor pages and accessible listing pages
- bounded `web-automation` actions for rendered listing/photo views
- one final `scripts/property-assessor render-report ...` attempt only after the decision-grade report is complete
Avoid when possible:
- `node -e "..."`

View File

@@ -18,6 +18,11 @@ Automated web browsing and scraping using Playwright-compatible CloakBrowser, wi
- Use `node skills/web-automation/scripts/zillow-discover.js "<street-address>"` or `har-discover.js` to resolve a real-estate listing URL from an address
- Use `node skills/web-automation/scripts/zillow-photos.js "<listing-url>"` or `har-photos.js` for real-estate photo extraction before attempting generic gallery automation
Messaging rule:
- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to helper scripts for every core step.
- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks.
- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment.
## Requirements
- Node.js 20+
@@ -148,6 +153,7 @@ What it does:
- resolves directly to a property page when Zillow supports the address slug
- otherwise looks for a `homedetails` listing link in the rendered page
- returns the discovered listing URL as JSON
- fails fast with a timeout if the browser-backed discovery stalls
### HAR discovery
@@ -161,6 +167,7 @@ What it does:
- looks for a confident `homedetail` match in rendered results
- returns the discovered listing URL when HAR exposes a strong enough match
- returns `listingUrl: null` when HAR discovery is not confident enough
- fails fast with a timeout if the browser-backed discovery stalls
### Zillow
@@ -174,6 +181,7 @@ What it does:
- tries the `See all photos` / `See all X photos` entry point
- if Zillow keeps the click path flaky, falls back to the listing's embedded `__NEXT_DATA__` payload
- returns direct `photos.zillowstatic.com` image URLs as JSON
- fails fast with a timeout if the browser-backed extraction stalls
Expected success shape:
- `complete: true`
@@ -191,6 +199,7 @@ What it does:
- opens the HAR listing page
- clicks `Show all photos` / `View all photos`
- extracts the direct `pics.harstatic.com` image URLs from the all-photos page
- fails fast with a timeout if the browser-backed extraction stalls
Expected success shape:
- `complete: true`

View File

@@ -56,10 +56,14 @@ Rules:
- Avoid fragile interactive gallery flows if they are likely to require approval or bounce the user into Control UI.
- If a richer photo pass would require approval, do not silently force that path first. Continue with the best approval-free workflow available and clearly lower confidence if needed.
- Only escalate to approval-heavy browser interaction when there is no reasonable alternative and the extra fidelity materially changes the assessment.
- In WhatsApp or similar messaging runs, keep the core assessment on native chat-safe tools: `web_search`, `web_fetch`, and bounded `web-automation` browsing/extraction.
- In those messaging runs, do **not** make `scripts/property-assessor assess`, `scripts/property-assessor locate-public-records`, `node zillow-discover.js`, `node har-discover.js`, `node zillow-photos.js`, `node har-photos.js`, `curl`, or `wget` the default core-analysis path.
- From messaging runs, the only subprocess-style step you should attempt by default is the final `scripts/property-assessor render-report` call after the verdict, fair-value range, and report body are complete.
- If the final PDF render fails, return the complete decision-grade report in chat and say the render/send step failed. Do not restart the whole assessment.
- Do **not** run `npm install`, `npm ci`, or other dependency-setup commands during a normal chat assessment flow when the local skill already has `node_modules` present.
- On this machine, treat `property-assessor` dependencies as already installed. Use the helper entrypoints directly unless the wrapper itself explicitly reports missing local dependencies.
- Do **not** use ad hoc shell snippets, heredocs, or inline interpreter eval for public-record or CAD lookup from chat. Avoid forms like `python3 - <<'PY'`, `python -c`, `node -e`, `node --input-type=module -e`, or raw `bash -lc '...'` probes.
- For public-record enrichment, use `scripts/property-assessor locate-public-records`, `scripts/property-assessor assess`, or `web_fetch`. If a one-off fetch helper is truly needed, add a file-based helper under the skill tree first and run that file-based entrypoint instead of inline code.
- For public-record enrichment in chat, prefer `web_fetch` plus official assessor/CAD pages. If a one-off helper is truly needed, add a file-based helper under the skill tree first and use it from a non-messaging surface instead of inline code.
## Source order
@@ -75,6 +79,8 @@ Use `web_search` sparingly to discover alternate URLs, then return to `web-autom
## Helper runtime
These helper commands are primarily for local/manual runs, non-chat surfaces, and the final PDF render step. They are **not** the default WhatsApp core-analysis path.
`property-assessor` now includes TypeScript helper commands for:
- address-first preliminary assessment assembly
- public-record jurisdiction lookup
@@ -98,7 +104,7 @@ scripts/property-assessor locate-public-records --address "<street-address>"
scripts/property-assessor render-report --input "<report-payload-json>" --output "<output-pdf>"
```
`assess` is the address-first entrypoint. It should:
`assess` is the address-first entrypoint for helper-driven runs. It should:
- require the assessment purpose
- treat the assessment purpose as missing unless it is present in the current request or explicitly confirmed as unchanged from earlier context
- resolve official public-record jurisdiction automatically from the address
@@ -131,18 +137,18 @@ Default approach:
Approval-safe rule:
- do not perform CAD/public-record discovery with inline shell or interpreter snippets
- use the built-in TypeScript helper path first
- if the official site needs a plain page fetch, prefer `web_fetch`
- if rendered interaction is unavoidable, use a file-based `web-automation` helper rather than ad hoc shell text
- in chat/messaging runs, prefer `web_fetch` plus official CAD/assessor pages first
- use the built-in TypeScript helper path on local/manual surfaces or for the final PDF render step
- if rendered interaction is unavoidable, use bounded `web-automation` behavior rather than ad hoc shell text
Use the helper CLI first:
Use the helper CLI first on local/manual surfaces:
```bash
cd ~/.openclaw/workspace/skills/property-assessor
scripts/property-assessor locate-public-records --address "<street-address>"
```
When you want the helper to assemble the preliminary assessment payload in one step, use:
When you want the helper to assemble the preliminary assessment payload in one step from a non-messaging surface, use:
```bash
cd ~/.openclaw/workspace/skills/property-assessor

View File

@@ -0,0 +1,37 @@
export class TimeoutError extends Error {
readonly timeoutMs: number;
constructor(operationName: string, timeoutMs: number) {
super(`${operationName} timed out after ${timeoutMs}ms`);
this.name = "TimeoutError";
this.timeoutMs = timeoutMs;
}
}
export async function withTimeout<T>(
operation: () => Promise<T>,
{
operationName,
timeoutMs
}: {
operationName: string;
timeoutMs: number;
}
): Promise<T> {
let timer: NodeJS.Timeout | undefined;
try {
return await Promise.race([
operation(),
new Promise<T>((_, reject) => {
timer = setTimeout(() => {
reject(new TimeoutError(operationName, timeoutMs));
}, timeoutMs);
})
]);
} finally {
if (timer) {
clearTimeout(timer);
}
}
}

View File

@@ -1,5 +1,6 @@
import { discoverHarListing } from "../../web-automation/scripts/har-discover.js";
import { discoverZillowListing } from "../../web-automation/scripts/zillow-discover.js";
import { TimeoutError, withTimeout } from "./async-timeout.js";
export interface ListingDiscoveryResult {
attempts: string[];
@@ -7,29 +8,65 @@ export interface ListingDiscoveryResult {
harUrl: string | null;
}
export async function discoverListingSources(address: string): Promise<ListingDiscoveryResult> {
interface ListingDiscoveryDeps {
timeoutMs?: number;
discoverZillowListingFn?: typeof discoverZillowListing;
discoverHarListingFn?: typeof discoverHarListing;
}
const DEFAULT_DISCOVERY_TIMEOUT_MS = Number(
process.env.PROPERTY_ASSESSOR_DISCOVERY_TIMEOUT_MS || 20_000
);
export async function discoverListingSources(
address: string,
deps: ListingDiscoveryDeps = {}
): Promise<ListingDiscoveryResult> {
const attempts: string[] = [];
let zillowUrl: string | null = null;
let harUrl: string | null = null;
const timeoutMs = deps.timeoutMs ?? DEFAULT_DISCOVERY_TIMEOUT_MS;
const discoverZillowListingFn = deps.discoverZillowListingFn || discoverZillowListing;
const discoverHarListingFn = deps.discoverHarListingFn || discoverHarListing;
try {
const result = await discoverZillowListing(address);
const result = await withTimeout(
() => discoverZillowListingFn(address),
{
operationName: "Zillow discovery",
timeoutMs
}
);
zillowUrl = result.listingUrl;
attempts.push(...result.attempts);
} catch (error) {
attempts.push(
`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`
);
if (error instanceof TimeoutError) {
attempts.push(`Zillow discovery timed out after ${timeoutMs}ms.`);
} else {
attempts.push(
`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`
);
}
}
try {
const result = await discoverHarListing(address);
const result = await withTimeout(
() => discoverHarListingFn(address),
{
operationName: "HAR discovery",
timeoutMs
}
);
harUrl = result.listingUrl;
attempts.push(...result.attempts);
} catch (error) {
attempts.push(
`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`
);
if (error instanceof TimeoutError) {
attempts.push(`HAR discovery timed out after ${timeoutMs}ms.`);
} else {
attempts.push(
`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`
);
}
}
return {

View File

@@ -1,5 +1,6 @@
import { extractHarPhotos } from "../../web-automation/scripts/har-photos.js";
import { extractZillowPhotos } from "../../web-automation/scripts/zillow-photos.js";
import { withTimeout } from "./async-timeout.js";
export type PhotoSource = "zillow" | "har";
@@ -19,12 +20,33 @@ export interface PhotoReviewResolution {
discoveredListingUrls: Array<{ label: string; url: string }>;
}
interface PhotoReviewDeps {
timeoutMs?: number;
extractZillowPhotosFn?: typeof extractZillowPhotos;
extractHarPhotosFn?: typeof extractHarPhotos;
}
const DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS = Number(
process.env.PROPERTY_ASSESSOR_PHOTO_TIMEOUT_MS || 25_000
);
export async function extractPhotoData(
source: PhotoSource,
url: string
url: string,
deps: PhotoReviewDeps = {}
): Promise<PhotoExtractionResult> {
const timeoutMs = deps.timeoutMs ?? DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS;
const extractZillowPhotosFn = deps.extractZillowPhotosFn || extractZillowPhotos;
const extractHarPhotosFn = deps.extractHarPhotosFn || extractHarPhotos;
if (source === "zillow") {
const payload = await extractZillowPhotos(url);
const payload = await withTimeout(
() => extractZillowPhotosFn(url),
{
operationName: "Zillow photo extraction",
timeoutMs
}
);
return {
source,
requestedUrl: String(payload.requestedUrl || url),
@@ -37,7 +59,13 @@ export async function extractPhotoData(
};
}
const payload = await extractHarPhotos(url);
const payload = await withTimeout(
() => extractHarPhotosFn(url),
{
operationName: "HAR photo extraction",
timeoutMs
}
);
return {
source,
requestedUrl: String(payload.requestedUrl || url),

View File

@@ -0,0 +1,32 @@
import test from "node:test";
import assert from "node:assert/strict";
import { discoverListingSources } from "../src/listing-discovery.js";
import { extractPhotoData } from "../src/photo-review.js";
test("discoverListingSources times out stalled Zillow and HAR discovery calls", async () => {
const result = await discoverListingSources(
"1011 Ennis Joslin Rd APT 235, Corpus Christi, TX 78412",
{
timeoutMs: 20,
discoverZillowListingFn: async () => await new Promise(() => {}),
discoverHarListingFn: async () => await new Promise(() => {})
}
);
assert.equal(result.zillowUrl, null);
assert.equal(result.harUrl, null);
assert.match(result.attempts.join(" "), /zillow discovery timed out/i);
assert.match(result.attempts.join(" "), /har discovery timed out/i);
});
test("extractPhotoData times out a stalled photo extraction instead of hanging forever", async () => {
await assert.rejects(
async () =>
extractPhotoData("zillow", "https://www.zillow.com/example", {
timeoutMs: 20,
extractZillowPhotosFn: async () => await new Promise(() => {})
}),
/timed out/i
);
});

View File

@@ -76,6 +76,11 @@ pnpm rebuild better-sqlite3 esbuild
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
- Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'`
Messaging rule:
- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to these helper scripts for every core step.
- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks.
- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment.
## OpenClaw Exec Approvals / Allowlist
If OpenClaw prompts for exec approval every time this skill runs, add a local approvals allowlist for the main agent before retrying. This is especially helpful for repeated `extract.js`, `browse.ts`, and other CloakBrowser-backed scrapes.
@@ -154,6 +159,7 @@ The photo scripts are purpose-built for the common `See all photos` / `Show all
- click the all-photos entry point
- wait for the resulting photo page or scroller view
- extract direct image URLs from the rendered page
- fail fast with a timeout instead of hanging indefinitely when the browser-backed extraction stalls
Output is JSON with:
- `requestedUrl`

View File

@@ -7,6 +7,7 @@ import {
dismissCommonOverlays,
fail,
gotoListing,
runWithOperationTimeout,
sleep,
} from "./real-estate-photo-common.js";
import { parseAddressIdentity, scoreAddressCandidate } from "./real-estate-address.js";
@@ -64,63 +65,69 @@ export async function discoverHarListing(rawAddress) {
const identity = parseAddressIdentity(address);
const searchUrl = buildSearchUrl(address);
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
try {
const attempts = [`Opened HAR search URL: ${searchUrl}`];
await gotoListing(page, searchUrl, 2500);
await dismissCommonOverlays(page);
await sleep(1500);
return await runWithOperationTimeout(
"HAR discovery",
async () => {
const attempts = [`Opened HAR search URL: ${searchUrl}`];
await gotoListing(page, searchUrl, 2500);
await dismissCommonOverlays(page);
await sleep(1500);
let listingUrl = null;
if (page.url().includes("/homedetail/")) {
const directScore = scoreAddressCandidate(
identity,
`${page.url()} ${(await page.title()) || ""}`
);
if (directScore.matched) {
listingUrl = normalizeListingUrl(page.url());
attempts.push("HAR search URL resolved directly to a matching property page.");
} else {
attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough.");
}
} else {
const discovered = await collectListingUrl(page);
const scored = discovered
.map((candidate) => {
const match = scoreAddressCandidate(
let listingUrl = null;
if (page.url().includes("/homedetail/")) {
const directScore = scoreAddressCandidate(
identity,
`${candidate.url} ${candidate.text} ${candidate.parentText}`
`${page.url()} ${(await page.title()) || ""}`
);
return { ...candidate, match };
})
.sort((a, b) => b.match.score - a.match.score);
if (directScore.matched) {
listingUrl = normalizeListingUrl(page.url());
attempts.push("HAR search URL resolved directly to a matching property page.");
} else {
attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough.");
}
} else {
const discovered = await collectListingUrl(page);
const scored = discovered
.map((candidate) => {
const match = scoreAddressCandidate(
identity,
`${candidate.url} ${candidate.text} ${candidate.parentText}`
);
return { ...candidate, match };
})
.sort((a, b) => b.match.score - a.match.score);
if (scored[0]?.match.matched) {
listingUrl = normalizeListingUrl(scored[0].url);
attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`);
} else {
attempts.push("HAR discovery did not expose a confident homedetail match for this address.");
if (scored[0]?.match.matched) {
listingUrl = normalizeListingUrl(scored[0].url);
attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`);
} else {
attempts.push("HAR discovery did not expose a confident homedetail match for this address.");
}
}
return {
source: "har",
address,
searchUrl,
finalUrl: page.url(),
title: await page.title(),
listingUrl,
attempts,
};
},
{
onTimeout: closeContext
}
}
const result = {
source: "har",
address,
searchUrl,
finalUrl: page.url(),
title: await page.title(),
listingUrl,
attempts,
};
await context.close();
return result;
);
} catch (error) {
try {
await context.close();
} catch {
// Ignore close errors after the primary failure.
}
throw new Error(`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`);
} finally {
await closeContext();
}
}

View File

@@ -11,6 +11,7 @@ import {
gotoListing,
normalizeImageCandidates,
parseTarget,
runWithOperationTimeout,
scrollUntilSettled,
sleep,
waitForPhotoExperience,
@@ -34,51 +35,56 @@ async function getAnnouncedPhotoCount(page) {
export async function extractHarPhotos(rawUrl) {
const requestedUrl = parseTarget(rawUrl);
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
try {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
return await runWithOperationTimeout(
"HAR photo extraction",
async () => {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
const candidates = await collectRenderedImageCandidates(page);
const photos = normalizeImageCandidates(candidates, {
hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"],
minWidth: 240,
minHeight: 180,
});
const candidates = await collectRenderedImageCandidates(page);
const photos = normalizeImageCandidates(candidates, {
hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"],
minWidth: 240,
minHeight: 180,
});
if (!photos.length) {
fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view.");
}
if (!photos.length) {
fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view.");
}
const result = {
source: "har",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."],
};
await context.close();
return result;
return {
source: "har",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."],
};
},
{
onTimeout: closeContext
}
);
} catch (error) {
try {
await context.close();
} catch {
// Ignore close errors after the primary failure.
}
throw new Error(error instanceof Error ? error.message : String(error));
} finally {
await closeContext();
}
}

View File

@@ -7,6 +7,7 @@ const MAX_SCROLL_PASSES = 12;
const SCROLL_PAUSE_MS = 900;
const LARGE_IMAGE_MIN_WIDTH = 300;
const LARGE_IMAGE_MIN_HEIGHT = 200;
const OPERATION_TIMEOUT_MS = Number(process.env.REAL_ESTATE_OPERATION_TIMEOUT_MS || 25000);
export function fail(message, details) {
const payload = { error: message };
@@ -38,6 +39,34 @@ export function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
export async function runWithOperationTimeout(
operationName,
operation,
{ timeoutMs = OPERATION_TIMEOUT_MS, onTimeout } = {}
) {
let timer;
try {
return await Promise.race([
operation(),
new Promise((_, reject) => {
timer = setTimeout(async () => {
try {
await onTimeout?.();
} catch {
// Ignore cleanup errors; the timeout is the primary failure.
}
reject(new Error(`${operationName} timed out after ${timeoutMs}ms`));
}, timeoutMs);
}),
]);
} finally {
if (timer) {
clearTimeout(timer);
}
}
}
export async function loadCloakBrowser() {
try {
return await import("cloakbrowser");
@@ -289,4 +318,3 @@ export function buildResult({
notes,
};
}

View File

@@ -1,7 +1,7 @@
import test from "node:test";
import assert from "node:assert/strict";
import { normalizeImageCandidates } from "./real-estate-photo-common.js";
import { normalizeImageCandidates, runWithOperationTimeout } from "./real-estate-photo-common.js";
test("normalizeImageCandidates keeps distinct Zillow photo URLs and strips query strings", () => {
const result = normalizeImageCandidates(
@@ -64,3 +64,24 @@ test("normalizeImageCandidates filters tiny HAR page assets and keeps large phot
"https://photos.har.com/123/main.jpg",
]);
});
test("runWithOperationTimeout rejects stalled work and runs timeout cleanup", async () => {
let cleanedUp = false;
await assert.rejects(
async () =>
runWithOperationTimeout(
"HAR photo extraction",
async () => await new Promise(() => {}),
{
timeoutMs: 20,
onTimeout: async () => {
cleanedUp = true;
},
}
),
/timed out/i
);
assert.equal(cleanedUp, true);
});

View File

@@ -7,6 +7,7 @@ import {
dismissCommonOverlays,
fail,
gotoListing,
runWithOperationTimeout,
sleep,
} from "./real-estate-photo-common.js";
import {
@@ -68,63 +69,69 @@ export async function discoverZillowListing(rawAddress) {
const identity = parseAddressIdentity(address);
const searchUrl = `https://www.zillow.com/homes/${encodeURIComponent(buildZillowAddressSlug(address))}_rb/`;
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
try {
const attempts = [`Opened Zillow address search URL: ${searchUrl}`];
await gotoListing(page, searchUrl, 2500);
await dismissCommonOverlays(page);
await sleep(1500);
return await runWithOperationTimeout(
"Zillow discovery",
async () => {
const attempts = [`Opened Zillow address search URL: ${searchUrl}`];
await gotoListing(page, searchUrl, 2500);
await dismissCommonOverlays(page);
await sleep(1500);
let listingUrl = null;
if (page.url().includes("/homedetails/")) {
const directScore = scoreAddressCandidate(
identity,
`${page.url()} ${(await page.title()) || ""}`
);
if (directScore.matched) {
listingUrl = normalizeListingUrl(page.url());
attempts.push("Zillow search URL resolved directly to a matching property page.");
} else {
attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough.");
}
} else {
const discovered = await collectListingUrl(page);
const scored = discovered
.map((candidate) => ({
...candidate,
match: scoreAddressCandidate(
let listingUrl = null;
if (page.url().includes("/homedetails/")) {
const directScore = scoreAddressCandidate(
identity,
`${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}`
)
}))
.sort((a, b) => b.match.score - a.match.score);
`${page.url()} ${(await page.title()) || ""}`
);
if (directScore.matched) {
listingUrl = normalizeListingUrl(page.url());
attempts.push("Zillow search URL resolved directly to a matching property page.");
} else {
attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough.");
}
} else {
const discovered = await collectListingUrl(page);
const scored = discovered
.map((candidate) => ({
...candidate,
match: scoreAddressCandidate(
identity,
`${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}`
)
}))
.sort((a, b) => b.match.score - a.match.score);
if (scored[0]?.match.matched) {
listingUrl = normalizeListingUrl(scored[0].url);
attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`);
} else {
attempts.push("Zillow discovery did not expose a confident homedetails match for this address.");
if (scored[0]?.match.matched) {
listingUrl = normalizeListingUrl(scored[0].url);
attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`);
} else {
attempts.push("Zillow discovery did not expose a confident homedetails match for this address.");
}
}
return {
source: "zillow",
address,
searchUrl,
finalUrl: page.url(),
title: await page.title(),
listingUrl,
attempts,
};
},
{
onTimeout: closeContext
}
}
const result = {
source: "zillow",
address,
searchUrl,
finalUrl: page.url(),
title: await page.title(),
listingUrl,
attempts,
};
await context.close();
return result;
);
} catch (error) {
try {
await context.close();
} catch {
// Ignore close errors after the primary failure.
}
throw new Error(`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`);
} finally {
await closeContext();
}
}

View File

@@ -10,6 +10,7 @@ import {
gotoListing,
normalizeImageCandidates,
parseTarget,
runWithOperationTimeout,
scrollUntilSettled,
sleep,
waitForPhotoExperience,
@@ -107,77 +108,82 @@ async function collectZillowStructuredPhotoCandidates(page) {
export async function extractZillowPhotos(rawUrl) {
const requestedUrl = parseTarget(rawUrl);
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
try {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
return await runWithOperationTimeout(
"Zillow photo extraction",
async () => {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
let clickedLabel = null;
let clickError = null;
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
let clickedLabel = null;
let clickError = null;
try {
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
} catch (error) {
clickError = error instanceof Error ? error.message : String(error);
}
try {
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
} catch (error) {
clickError = error instanceof Error ? error.message : String(error);
}
const [structuredCandidates, renderedCandidates] = await Promise.all([
collectZillowStructuredPhotoCandidates(page),
collectZillowPhotoCandidates(page),
]);
const candidates = [...structuredCandidates, ...renderedCandidates];
const normalized = normalizeImageCandidates(candidates, {
hostIncludes: ["photos.zillowstatic.com"],
minWidth: 240,
minHeight: 180,
});
const photos = collapseZillowPhotos(normalized);
const [structuredCandidates, renderedCandidates] = await Promise.all([
collectZillowStructuredPhotoCandidates(page),
collectZillowPhotoCandidates(page),
]);
const candidates = [...structuredCandidates, ...renderedCandidates];
const normalized = normalizeImageCandidates(candidates, {
hostIncludes: ["photos.zillowstatic.com"],
minWidth: 240,
minHeight: 180,
});
const photos = collapseZillowPhotos(normalized);
if (!photos.length) {
fail(
"Zillow photo extraction failed.",
clickError || "No Zillow image URLs were found on the rendered listing page."
);
}
if (!photos.length) {
fail(
"Zillow photo extraction failed.",
clickError || "No Zillow image URLs were found on the rendered listing page."
);
}
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
const notes = [];
if (clickedLabel) {
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
} else {
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
}
if (clickError) {
notes.push(`All-photos click path was not required: ${clickError}`);
}
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
const notes = [];
if (clickedLabel) {
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
} else {
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
}
if (clickError) {
notes.push(`All-photos click path was not required: ${clickError}`);
}
const result = {
source: "zillow",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes,
};
await context.close();
return result;
return {
source: "zillow",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes,
};
},
{
onTimeout: closeContext
}
);
} catch (error) {
try {
await context.close();
} catch {
// Ignore close errors after the primary failure.
}
throw new Error(error instanceof Error ? error.message : String(error));
} finally {
await closeContext();
}
}