fix: make property-assessor safer for whatsapp runs
This commit is contained in:
180
docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md
Normal file
180
docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Property Assessor WhatsApp-Safe Runtime Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Keep WhatsApp property-assessor runs moving by failing fast on silent discovery/photo hangs, avoiding helper subprocesses during core analysis, and reserving subprocess use to the final PDF render attempt only.
|
||||
|
||||
**Architecture:** Add small timeout guards around the existing in-process listing discovery and photo extraction calls so one quiet browser-backed task cannot stall the entire assessment. Tighten the live skill and published docs so messaging runs treat chat-native source collection as the default path and helper commands as non-chat or final-render-only tools.
|
||||
|
||||
**Tech Stack:** TypeScript, Node test runner, existing OpenClaw property-assessor skill, existing OpenClaw web-automation modules, Markdown docs.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing timeout tests for discovery and photo extraction
|
||||
|
||||
**Files:**
|
||||
- Modify: `skills/property-assessor/tests/assessment.test.ts`
|
||||
|
||||
**Step 1: Write the failing test**
|
||||
|
||||
Add tests that stub discovery/photo functions with never-resolving promises and assert that:
|
||||
- listing discovery returns `null` URLs and records timeout attempts
|
||||
- photo extraction returns `not completed` instead of hanging forever
|
||||
|
||||
**Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `npm test -- --test-name-pattern "times out"`
|
||||
Expected: FAIL because current code never times out or records timeout attempts.
|
||||
|
||||
**Step 3: Write minimal implementation**
|
||||
|
||||
Implement the smallest timeout wrapper needed for the tests to pass.
|
||||
|
||||
**Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `npm test -- --test-name-pattern "times out"`
|
||||
Expected: PASS
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add skills/property-assessor/tests/assessment.test.ts
|
||||
git commit -m "test: cover stalled discovery and photo extraction"
|
||||
```
|
||||
|
||||
### Task 2: Implement hard timeout guards in the live assessment path
|
||||
|
||||
**Files:**
|
||||
- Create: `skills/property-assessor/src/async-timeout.ts`
|
||||
- Modify: `skills/property-assessor/src/listing-discovery.ts`
|
||||
- Modify: `skills/property-assessor/src/photo-review.ts`
|
||||
|
||||
**Step 1: Write the failing test**
|
||||
|
||||
Use the tests from Task 1 as the red phase.
|
||||
|
||||
**Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `npm test -- --test-name-pattern "times out"`
|
||||
Expected: FAIL
|
||||
|
||||
**Step 3: Write minimal implementation**
|
||||
|
||||
Add:
|
||||
- a shared timeout helper for async operations
|
||||
- timeout-wrapped Zillow/HAR discovery in `listing-discovery.ts`
|
||||
- timeout-wrapped Zillow/HAR photo extraction in `photo-review.ts`
|
||||
- clear timeout attempt messages so the assessment can continue honestly
|
||||
|
||||
**Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `npm test -- --test-name-pattern "times out"`
|
||||
Expected: PASS
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts
|
||||
git commit -m "fix: fail fast on stalled property-assessor extraction steps"
|
||||
```
|
||||
|
||||
### Task 3: Tighten live skill instructions for WhatsApp-safe execution
|
||||
|
||||
**Files:**
|
||||
- Modify: `../skills/property-assessor/SKILL.md`
|
||||
|
||||
**Step 1: Write the failing test**
|
||||
|
||||
No automated test. Use the documented runtime rule as the spec:
|
||||
- WhatsApp/messaging runs must avoid helper subprocesses for core analysis
|
||||
- only the final PDF render attempt may use the helper subprocess path
|
||||
- `update?` must remain status-only
|
||||
|
||||
**Step 2: Run verification to confirm current docs are wrong**
|
||||
|
||||
Run: `rg -n "scripts/property-assessor assess|node zillow-photos|node har-photos|Good:" ../skills/property-assessor/SKILL.md`
|
||||
Expected: current doc still presents helper commands as normal chat-safe core workflow.
|
||||
|
||||
**Step 3: Write minimal implementation**
|
||||
|
||||
Update the live skill doc to:
|
||||
- prefer `web_search`, `web_fetch`, and bounded `web-automation` for core assessment
|
||||
- forbid `scripts/property-assessor assess`, `node zillow-photos.js`, `node har-photos.js`, and ad hoc `curl` as the default WhatsApp core path
|
||||
- allow a single final PDF render attempt only after a decision-grade verdict exists
|
||||
|
||||
**Step 4: Run verification**
|
||||
|
||||
Run: `sed -n '1,220p' ../skills/property-assessor/SKILL.md`
|
||||
Expected: the WhatsApp-safe runtime rules are explicit and unambiguous.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add ../skills/property-assessor/SKILL.md
|
||||
git commit -m "docs: clarify whatsapp-safe property-assessor execution"
|
||||
```
|
||||
|
||||
### Task 4: Mirror the runtime guidance into the published repo docs
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/property-assessor.md`
|
||||
- Modify: `docs/web-automation.md`
|
||||
|
||||
**Step 1: Write the failing test**
|
||||
|
||||
No automated test. The spec is consistency with the live skill instructions.
|
||||
|
||||
**Step 2: Run verification to confirm current docs drift**
|
||||
|
||||
Run: `rg -n "node zillow-photos|node har-photos|assess --address" docs/property-assessor.md docs/web-automation.md`
|
||||
Expected: current docs still imply subprocess-heavy commands are the standard chat path.
|
||||
|
||||
**Step 3: Write minimal implementation**
|
||||
|
||||
Document:
|
||||
- chat-native assessment first
|
||||
- timeout-protected discovery/photo extraction behavior
|
||||
- final-render-only subprocess attempt from messaging runs
|
||||
|
||||
**Step 4: Run verification**
|
||||
|
||||
Run: `sed -n '1,220p' docs/property-assessor.md && sed -n '1,220p' docs/web-automation.md`
|
||||
Expected: published docs match the live skill behavior.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/property-assessor.md docs/web-automation.md
|
||||
git commit -m "docs: document whatsapp-safe property assessment flow"
|
||||
```
|
||||
|
||||
### Task 5: Verify the focused runtime behavior
|
||||
|
||||
**Files:**
|
||||
- Modify: `skills/property-assessor/tests/assessment.test.ts`
|
||||
- Verify: `skills/property-assessor/src/*.ts`
|
||||
- Verify: `../skills/property-assessor/SKILL.md`
|
||||
- Verify: `docs/property-assessor.md`
|
||||
- Verify: `docs/web-automation.md`
|
||||
|
||||
**Step 1: Run focused tests**
|
||||
|
||||
Run: `npm test`
|
||||
Expected: all `property-assessor` tests pass, including timeout coverage.
|
||||
|
||||
**Step 2: Run targeted source verification**
|
||||
|
||||
Run: `rg -n "withTimeout|timed out|final PDF render" skills/property-assessor/src ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md`
|
||||
Expected: timeout guards and the final-render-only messaging rule are present.
|
||||
|
||||
**Step 3: Inspect git status**
|
||||
|
||||
Run: `git status --short`
|
||||
Expected: only intended files are modified.
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add skills/property-assessor/src/async-timeout.ts skills/property-assessor/src/listing-discovery.ts skills/property-assessor/src/photo-review.ts skills/property-assessor/tests/assessment.test.ts ../skills/property-assessor/SKILL.md docs/property-assessor.md docs/web-automation.md docs/plans/2026-03-28-property-assessor-whatsapp-safe-runtime.md
|
||||
git commit -m "fix: make property-assessor safer for whatsapp runs"
|
||||
```
|
||||
@@ -67,6 +67,9 @@ Operational rule:
|
||||
- Preliminary helper output should be treated as structured scaffolding for the remaining work, not as a reason to stop and wait for another user nudge.
|
||||
- In chat/messaging runs, do not waste the turn on `npm install` or `npm ci` when the local skill dependencies are already present.
|
||||
- If the user asks `update?` or `and?` mid-run, treat that as a status request and continue the same assessment rather than restarting or stopping at the last checkpoint.
|
||||
- In WhatsApp or similar messaging runs, keep the core analysis on native `web_search`, `web_fetch`, and bounded browser actions. Do not make helper subprocesses the default path for discovery, CAD lookup, or photo extraction.
|
||||
- In those messaging runs, reserve subprocess use for a single final `render-report` attempt after the verdict and fair-value range are complete.
|
||||
- If final PDF render/send fails, return the completed decision-grade report in chat and report delivery failure separately rather than restarting the whole assessment.
|
||||
|
||||
### `assess`
|
||||
|
||||
@@ -84,6 +87,7 @@ Current behavior:
|
||||
- automatically tries to discover Zillow and HAR listing URLs from the address when no listing URL is provided
|
||||
- runs Zillow photo extraction first, then HAR as fallback when available
|
||||
- reuses the OpenClaw web-automation logic in-process instead of spawning nested helper commands
|
||||
- fails fast when Zillow/HAR discovery or photo extraction stalls instead of hanging indefinitely
|
||||
- returns a structured preliminary report payload
|
||||
- does not require recipient email(s) for the analysis-only run
|
||||
- asks for recipient email(s) only when PDF rendering is explicitly requested
|
||||
@@ -226,7 +230,9 @@ When those extractors return `imageUrls`, that returned set is the photo-review
|
||||
|
||||
## Approval-safe command shape
|
||||
|
||||
For chat-driven runs, prefer file-based commands.
|
||||
For local/manual runs, prefer file-based commands.
|
||||
|
||||
For chat-driven messaging runs, prefer native `web_search`, `web_fetch`, and bounded browser actions first. Treat these commands as local/manual helpers or non-chat fallbacks, not as the default core-analysis path.
|
||||
|
||||
Good:
|
||||
|
||||
@@ -240,6 +246,13 @@ Good:
|
||||
- `scripts/property-assessor render-report --input ... --output ...`
|
||||
- `web_fetch` to read an official CAD / assessor page when the TypeScript helper needs a plain page fetch
|
||||
|
||||
Messaging default:
|
||||
|
||||
- `web_search` to discover listing and public-record URLs
|
||||
- `web_fetch` for official CAD / assessor pages and accessible listing pages
|
||||
- bounded `web-automation` actions for rendered listing/photo views
|
||||
- one final `scripts/property-assessor render-report ...` attempt only after the decision-grade report is complete
|
||||
|
||||
Avoid when possible:
|
||||
|
||||
- `node -e "..."`
|
||||
|
||||
@@ -18,6 +18,11 @@ Automated web browsing and scraping using Playwright-compatible CloakBrowser, wi
|
||||
- Use `node skills/web-automation/scripts/zillow-discover.js "<street-address>"` or `har-discover.js` to resolve a real-estate listing URL from an address
|
||||
- Use `node skills/web-automation/scripts/zillow-photos.js "<listing-url>"` or `har-photos.js` for real-estate photo extraction before attempting generic gallery automation
|
||||
|
||||
Messaging rule:
|
||||
- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to helper scripts for every core step.
|
||||
- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks.
|
||||
- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Node.js 20+
|
||||
@@ -148,6 +153,7 @@ What it does:
|
||||
- resolves directly to a property page when Zillow supports the address slug
|
||||
- otherwise looks for a `homedetails` listing link in the rendered page
|
||||
- returns the discovered listing URL as JSON
|
||||
- fails fast with a timeout if the browser-backed discovery stalls
|
||||
|
||||
### HAR discovery
|
||||
|
||||
@@ -161,6 +167,7 @@ What it does:
|
||||
- looks for a confident `homedetail` match in rendered results
|
||||
- returns the discovered listing URL when HAR exposes a strong enough match
|
||||
- returns `listingUrl: null` when HAR discovery is not confident enough
|
||||
- fails fast with a timeout if the browser-backed discovery stalls
|
||||
|
||||
### Zillow
|
||||
|
||||
@@ -174,6 +181,7 @@ What it does:
|
||||
- tries the `See all photos` / `See all X photos` entry point
|
||||
- if Zillow keeps the click path flaky, falls back to the listing's embedded `__NEXT_DATA__` payload
|
||||
- returns direct `photos.zillowstatic.com` image URLs as JSON
|
||||
- fails fast with a timeout if the browser-backed extraction stalls
|
||||
|
||||
Expected success shape:
|
||||
- `complete: true`
|
||||
@@ -191,6 +199,7 @@ What it does:
|
||||
- opens the HAR listing page
|
||||
- clicks `Show all photos` / `View all photos`
|
||||
- extracts the direct `pics.harstatic.com` image URLs from the all-photos page
|
||||
- fails fast with a timeout if the browser-backed extraction stalls
|
||||
|
||||
Expected success shape:
|
||||
- `complete: true`
|
||||
|
||||
@@ -56,10 +56,14 @@ Rules:
|
||||
- Avoid fragile interactive gallery flows if they are likely to require approval or bounce the user into Control UI.
|
||||
- If a richer photo pass would require approval, do not silently force that path first. Continue with the best approval-free workflow available and clearly lower confidence if needed.
|
||||
- Only escalate to approval-heavy browser interaction when there is no reasonable alternative and the extra fidelity materially changes the assessment.
|
||||
- In WhatsApp or similar messaging runs, keep the core assessment on native chat-safe tools: `web_search`, `web_fetch`, and bounded `web-automation` browsing/extraction.
|
||||
- In those messaging runs, do **not** make `scripts/property-assessor assess`, `scripts/property-assessor locate-public-records`, `node zillow-discover.js`, `node har-discover.js`, `node zillow-photos.js`, `node har-photos.js`, `curl`, or `wget` the default core-analysis path.
|
||||
- From messaging runs, the only subprocess-style step you should attempt by default is the final `scripts/property-assessor render-report` call after the verdict, fair-value range, and report body are complete.
|
||||
- If the final PDF render fails, return the complete decision-grade report in chat and say the render/send step failed. Do not restart the whole assessment.
|
||||
- Do **not** run `npm install`, `npm ci`, or other dependency-setup commands during a normal chat assessment flow when the local skill already has `node_modules` present.
|
||||
- On this machine, treat `property-assessor` dependencies as already installed. Use the helper entrypoints directly unless the wrapper itself explicitly reports missing local dependencies.
|
||||
- Do **not** use ad hoc shell snippets, heredocs, or inline interpreter eval for public-record or CAD lookup from chat. Avoid forms like `python3 - <<'PY'`, `python -c`, `node -e`, `node --input-type=module -e`, or raw `bash -lc '...'` probes.
|
||||
- For public-record enrichment, use `scripts/property-assessor locate-public-records`, `scripts/property-assessor assess`, or `web_fetch`. If a one-off fetch helper is truly needed, add a file-based helper under the skill tree first and run that file-based entrypoint instead of inline code.
|
||||
- For public-record enrichment in chat, prefer `web_fetch` plus official assessor/CAD pages. If a one-off helper is truly needed, add a file-based helper under the skill tree first and use it from a non-messaging surface instead of inline code.
|
||||
|
||||
## Source order
|
||||
|
||||
@@ -75,6 +79,8 @@ Use `web_search` sparingly to discover alternate URLs, then return to `web-autom
|
||||
|
||||
## Helper runtime
|
||||
|
||||
These helper commands are primarily for local/manual runs, non-chat surfaces, and the final PDF render step. They are **not** the default WhatsApp core-analysis path.
|
||||
|
||||
`property-assessor` now includes TypeScript helper commands for:
|
||||
- address-first preliminary assessment assembly
|
||||
- public-record jurisdiction lookup
|
||||
@@ -98,7 +104,7 @@ scripts/property-assessor locate-public-records --address "<street-address>"
|
||||
scripts/property-assessor render-report --input "<report-payload-json>" --output "<output-pdf>"
|
||||
```
|
||||
|
||||
`assess` is the address-first entrypoint. It should:
|
||||
`assess` is the address-first entrypoint for helper-driven runs. It should:
|
||||
- require the assessment purpose
|
||||
- treat the assessment purpose as missing unless it is present in the current request or explicitly confirmed as unchanged from earlier context
|
||||
- resolve official public-record jurisdiction automatically from the address
|
||||
@@ -131,18 +137,18 @@ Default approach:
|
||||
|
||||
Approval-safe rule:
|
||||
- do not perform CAD/public-record discovery with inline shell or interpreter snippets
|
||||
- use the built-in TypeScript helper path first
|
||||
- if the official site needs a plain page fetch, prefer `web_fetch`
|
||||
- if rendered interaction is unavoidable, use a file-based `web-automation` helper rather than ad hoc shell text
|
||||
- in chat/messaging runs, prefer `web_fetch` plus official CAD/assessor pages first
|
||||
- use the built-in TypeScript helper path on local/manual surfaces or for the final PDF render step
|
||||
- if rendered interaction is unavoidable, use bounded `web-automation` behavior rather than ad hoc shell text
|
||||
|
||||
Use the helper CLI first:
|
||||
Use the helper CLI first on local/manual surfaces:
|
||||
|
||||
```bash
|
||||
cd ~/.openclaw/workspace/skills/property-assessor
|
||||
scripts/property-assessor locate-public-records --address "<street-address>"
|
||||
```
|
||||
|
||||
When you want the helper to assemble the preliminary assessment payload in one step, use:
|
||||
When you want the helper to assemble the preliminary assessment payload in one step from a non-messaging surface, use:
|
||||
|
||||
```bash
|
||||
cd ~/.openclaw/workspace/skills/property-assessor
|
||||
|
||||
37
skills/property-assessor/src/async-timeout.ts
Normal file
37
skills/property-assessor/src/async-timeout.ts
Normal file
@@ -0,0 +1,37 @@
|
||||
export class TimeoutError extends Error {
|
||||
readonly timeoutMs: number;
|
||||
|
||||
constructor(operationName: string, timeoutMs: number) {
|
||||
super(`${operationName} timed out after ${timeoutMs}ms`);
|
||||
this.name = "TimeoutError";
|
||||
this.timeoutMs = timeoutMs;
|
||||
}
|
||||
}
|
||||
|
||||
export async function withTimeout<T>(
|
||||
operation: () => Promise<T>,
|
||||
{
|
||||
operationName,
|
||||
timeoutMs
|
||||
}: {
|
||||
operationName: string;
|
||||
timeoutMs: number;
|
||||
}
|
||||
): Promise<T> {
|
||||
let timer: NodeJS.Timeout | undefined;
|
||||
|
||||
try {
|
||||
return await Promise.race([
|
||||
operation(),
|
||||
new Promise<T>((_, reject) => {
|
||||
timer = setTimeout(() => {
|
||||
reject(new TimeoutError(operationName, timeoutMs));
|
||||
}, timeoutMs);
|
||||
})
|
||||
]);
|
||||
} finally {
|
||||
if (timer) {
|
||||
clearTimeout(timer);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,5 +1,6 @@
|
||||
import { discoverHarListing } from "../../web-automation/scripts/har-discover.js";
|
||||
import { discoverZillowListing } from "../../web-automation/scripts/zillow-discover.js";
|
||||
import { TimeoutError, withTimeout } from "./async-timeout.js";
|
||||
|
||||
export interface ListingDiscoveryResult {
|
||||
attempts: string[];
|
||||
@@ -7,29 +8,65 @@ export interface ListingDiscoveryResult {
|
||||
harUrl: string | null;
|
||||
}
|
||||
|
||||
export async function discoverListingSources(address: string): Promise<ListingDiscoveryResult> {
|
||||
interface ListingDiscoveryDeps {
|
||||
timeoutMs?: number;
|
||||
discoverZillowListingFn?: typeof discoverZillowListing;
|
||||
discoverHarListingFn?: typeof discoverHarListing;
|
||||
}
|
||||
|
||||
const DEFAULT_DISCOVERY_TIMEOUT_MS = Number(
|
||||
process.env.PROPERTY_ASSESSOR_DISCOVERY_TIMEOUT_MS || 20_000
|
||||
);
|
||||
|
||||
export async function discoverListingSources(
|
||||
address: string,
|
||||
deps: ListingDiscoveryDeps = {}
|
||||
): Promise<ListingDiscoveryResult> {
|
||||
const attempts: string[] = [];
|
||||
let zillowUrl: string | null = null;
|
||||
let harUrl: string | null = null;
|
||||
const timeoutMs = deps.timeoutMs ?? DEFAULT_DISCOVERY_TIMEOUT_MS;
|
||||
const discoverZillowListingFn = deps.discoverZillowListingFn || discoverZillowListing;
|
||||
const discoverHarListingFn = deps.discoverHarListingFn || discoverHarListing;
|
||||
|
||||
try {
|
||||
const result = await discoverZillowListing(address);
|
||||
const result = await withTimeout(
|
||||
() => discoverZillowListingFn(address),
|
||||
{
|
||||
operationName: "Zillow discovery",
|
||||
timeoutMs
|
||||
}
|
||||
);
|
||||
zillowUrl = result.listingUrl;
|
||||
attempts.push(...result.attempts);
|
||||
} catch (error) {
|
||||
attempts.push(
|
||||
`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`
|
||||
);
|
||||
if (error instanceof TimeoutError) {
|
||||
attempts.push(`Zillow discovery timed out after ${timeoutMs}ms.`);
|
||||
} else {
|
||||
attempts.push(
|
||||
`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await discoverHarListing(address);
|
||||
const result = await withTimeout(
|
||||
() => discoverHarListingFn(address),
|
||||
{
|
||||
operationName: "HAR discovery",
|
||||
timeoutMs
|
||||
}
|
||||
);
|
||||
harUrl = result.listingUrl;
|
||||
attempts.push(...result.attempts);
|
||||
} catch (error) {
|
||||
attempts.push(
|
||||
`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`
|
||||
);
|
||||
if (error instanceof TimeoutError) {
|
||||
attempts.push(`HAR discovery timed out after ${timeoutMs}ms.`);
|
||||
} else {
|
||||
attempts.push(
|
||||
`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { extractHarPhotos } from "../../web-automation/scripts/har-photos.js";
|
||||
import { extractZillowPhotos } from "../../web-automation/scripts/zillow-photos.js";
|
||||
import { withTimeout } from "./async-timeout.js";
|
||||
|
||||
export type PhotoSource = "zillow" | "har";
|
||||
|
||||
@@ -19,12 +20,33 @@ export interface PhotoReviewResolution {
|
||||
discoveredListingUrls: Array<{ label: string; url: string }>;
|
||||
}
|
||||
|
||||
interface PhotoReviewDeps {
|
||||
timeoutMs?: number;
|
||||
extractZillowPhotosFn?: typeof extractZillowPhotos;
|
||||
extractHarPhotosFn?: typeof extractHarPhotos;
|
||||
}
|
||||
|
||||
const DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS = Number(
|
||||
process.env.PROPERTY_ASSESSOR_PHOTO_TIMEOUT_MS || 25_000
|
||||
);
|
||||
|
||||
export async function extractPhotoData(
|
||||
source: PhotoSource,
|
||||
url: string
|
||||
url: string,
|
||||
deps: PhotoReviewDeps = {}
|
||||
): Promise<PhotoExtractionResult> {
|
||||
const timeoutMs = deps.timeoutMs ?? DEFAULT_PHOTO_EXTRACTION_TIMEOUT_MS;
|
||||
const extractZillowPhotosFn = deps.extractZillowPhotosFn || extractZillowPhotos;
|
||||
const extractHarPhotosFn = deps.extractHarPhotosFn || extractHarPhotos;
|
||||
|
||||
if (source === "zillow") {
|
||||
const payload = await extractZillowPhotos(url);
|
||||
const payload = await withTimeout(
|
||||
() => extractZillowPhotosFn(url),
|
||||
{
|
||||
operationName: "Zillow photo extraction",
|
||||
timeoutMs
|
||||
}
|
||||
);
|
||||
return {
|
||||
source,
|
||||
requestedUrl: String(payload.requestedUrl || url),
|
||||
@@ -37,7 +59,13 @@ export async function extractPhotoData(
|
||||
};
|
||||
}
|
||||
|
||||
const payload = await extractHarPhotos(url);
|
||||
const payload = await withTimeout(
|
||||
() => extractHarPhotosFn(url),
|
||||
{
|
||||
operationName: "HAR photo extraction",
|
||||
timeoutMs
|
||||
}
|
||||
);
|
||||
return {
|
||||
source,
|
||||
requestedUrl: String(payload.requestedUrl || url),
|
||||
|
||||
32
skills/property-assessor/tests/timeout-guards.test.ts
Normal file
32
skills/property-assessor/tests/timeout-guards.test.ts
Normal file
@@ -0,0 +1,32 @@
|
||||
import test from "node:test";
|
||||
import assert from "node:assert/strict";
|
||||
|
||||
import { discoverListingSources } from "../src/listing-discovery.js";
|
||||
import { extractPhotoData } from "../src/photo-review.js";
|
||||
|
||||
test("discoverListingSources times out stalled Zillow and HAR discovery calls", async () => {
|
||||
const result = await discoverListingSources(
|
||||
"1011 Ennis Joslin Rd APT 235, Corpus Christi, TX 78412",
|
||||
{
|
||||
timeoutMs: 20,
|
||||
discoverZillowListingFn: async () => await new Promise(() => {}),
|
||||
discoverHarListingFn: async () => await new Promise(() => {})
|
||||
}
|
||||
);
|
||||
|
||||
assert.equal(result.zillowUrl, null);
|
||||
assert.equal(result.harUrl, null);
|
||||
assert.match(result.attempts.join(" "), /zillow discovery timed out/i);
|
||||
assert.match(result.attempts.join(" "), /har discovery timed out/i);
|
||||
});
|
||||
|
||||
test("extractPhotoData times out a stalled photo extraction instead of hanging forever", async () => {
|
||||
await assert.rejects(
|
||||
async () =>
|
||||
extractPhotoData("zillow", "https://www.zillow.com/example", {
|
||||
timeoutMs: 20,
|
||||
extractZillowPhotosFn: async () => await new Promise(() => {})
|
||||
}),
|
||||
/timed out/i
|
||||
);
|
||||
});
|
||||
@@ -76,6 +76,11 @@ pnpm rebuild better-sqlite3 esbuild
|
||||
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
|
||||
- Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'`
|
||||
|
||||
Messaging rule:
|
||||
- For WhatsApp or similar chat-driven runs, prefer native `web_search`, `web_fetch`, and bounded browser actions over shelling out to these helper scripts for every core step.
|
||||
- Treat the dedicated Zillow/HAR scripts as local/manual helpers, regression checks, or non-chat fallbacks.
|
||||
- If a messaging workflow needs a subprocess at all, reserve it for a single final delivery step rather than the whole assessment.
|
||||
|
||||
## OpenClaw Exec Approvals / Allowlist
|
||||
|
||||
If OpenClaw prompts for exec approval every time this skill runs, add a local approvals allowlist for the main agent before retrying. This is especially helpful for repeated `extract.js`, `browse.ts`, and other CloakBrowser-backed scrapes.
|
||||
@@ -154,6 +159,7 @@ The photo scripts are purpose-built for the common `See all photos` / `Show all
|
||||
- click the all-photos entry point
|
||||
- wait for the resulting photo page or scroller view
|
||||
- extract direct image URLs from the rendered page
|
||||
- fail fast with a timeout instead of hanging indefinitely when the browser-backed extraction stalls
|
||||
|
||||
Output is JSON with:
|
||||
- `requestedUrl`
|
||||
|
||||
@@ -7,6 +7,7 @@ import {
|
||||
dismissCommonOverlays,
|
||||
fail,
|
||||
gotoListing,
|
||||
runWithOperationTimeout,
|
||||
sleep,
|
||||
} from "./real-estate-photo-common.js";
|
||||
import { parseAddressIdentity, scoreAddressCandidate } from "./real-estate-address.js";
|
||||
@@ -64,63 +65,69 @@ export async function discoverHarListing(rawAddress) {
|
||||
const identity = parseAddressIdentity(address);
|
||||
const searchUrl = buildSearchUrl(address);
|
||||
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
|
||||
const closeContext = async () => {
|
||||
await context.close().catch(() => {});
|
||||
};
|
||||
|
||||
try {
|
||||
const attempts = [`Opened HAR search URL: ${searchUrl}`];
|
||||
await gotoListing(page, searchUrl, 2500);
|
||||
await dismissCommonOverlays(page);
|
||||
await sleep(1500);
|
||||
return await runWithOperationTimeout(
|
||||
"HAR discovery",
|
||||
async () => {
|
||||
const attempts = [`Opened HAR search URL: ${searchUrl}`];
|
||||
await gotoListing(page, searchUrl, 2500);
|
||||
await dismissCommonOverlays(page);
|
||||
await sleep(1500);
|
||||
|
||||
let listingUrl = null;
|
||||
if (page.url().includes("/homedetail/")) {
|
||||
const directScore = scoreAddressCandidate(
|
||||
identity,
|
||||
`${page.url()} ${(await page.title()) || ""}`
|
||||
);
|
||||
if (directScore.matched) {
|
||||
listingUrl = normalizeListingUrl(page.url());
|
||||
attempts.push("HAR search URL resolved directly to a matching property page.");
|
||||
} else {
|
||||
attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough.");
|
||||
}
|
||||
} else {
|
||||
const discovered = await collectListingUrl(page);
|
||||
const scored = discovered
|
||||
.map((candidate) => {
|
||||
const match = scoreAddressCandidate(
|
||||
let listingUrl = null;
|
||||
if (page.url().includes("/homedetail/")) {
|
||||
const directScore = scoreAddressCandidate(
|
||||
identity,
|
||||
`${candidate.url} ${candidate.text} ${candidate.parentText}`
|
||||
`${page.url()} ${(await page.title()) || ""}`
|
||||
);
|
||||
return { ...candidate, match };
|
||||
})
|
||||
.sort((a, b) => b.match.score - a.match.score);
|
||||
if (directScore.matched) {
|
||||
listingUrl = normalizeListingUrl(page.url());
|
||||
attempts.push("HAR search URL resolved directly to a matching property page.");
|
||||
} else {
|
||||
attempts.push("HAR redirected to a property page, but it did not match the requested address closely enough.");
|
||||
}
|
||||
} else {
|
||||
const discovered = await collectListingUrl(page);
|
||||
const scored = discovered
|
||||
.map((candidate) => {
|
||||
const match = scoreAddressCandidate(
|
||||
identity,
|
||||
`${candidate.url} ${candidate.text} ${candidate.parentText}`
|
||||
);
|
||||
return { ...candidate, match };
|
||||
})
|
||||
.sort((a, b) => b.match.score - a.match.score);
|
||||
|
||||
if (scored[0]?.match.matched) {
|
||||
listingUrl = normalizeListingUrl(scored[0].url);
|
||||
attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`);
|
||||
} else {
|
||||
attempts.push("HAR discovery did not expose a confident homedetail match for this address.");
|
||||
if (scored[0]?.match.matched) {
|
||||
listingUrl = normalizeListingUrl(scored[0].url);
|
||||
attempts.push(`HAR search results exposed a matching homedetail link with score ${scored[0].match.score}.`);
|
||||
} else {
|
||||
attempts.push("HAR discovery did not expose a confident homedetail match for this address.");
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
source: "har",
|
||||
address,
|
||||
searchUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
listingUrl,
|
||||
attempts,
|
||||
};
|
||||
},
|
||||
{
|
||||
onTimeout: closeContext
|
||||
}
|
||||
}
|
||||
|
||||
const result = {
|
||||
source: "har",
|
||||
address,
|
||||
searchUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
listingUrl,
|
||||
attempts,
|
||||
};
|
||||
await context.close();
|
||||
return result;
|
||||
);
|
||||
} catch (error) {
|
||||
try {
|
||||
await context.close();
|
||||
} catch {
|
||||
// Ignore close errors after the primary failure.
|
||||
}
|
||||
throw new Error(`HAR discovery failed: ${error instanceof Error ? error.message : String(error)}`);
|
||||
} finally {
|
||||
await closeContext();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -11,6 +11,7 @@ import {
|
||||
gotoListing,
|
||||
normalizeImageCandidates,
|
||||
parseTarget,
|
||||
runWithOperationTimeout,
|
||||
scrollUntilSettled,
|
||||
sleep,
|
||||
waitForPhotoExperience,
|
||||
@@ -34,51 +35,56 @@ async function getAnnouncedPhotoCount(page) {
|
||||
export async function extractHarPhotos(rawUrl) {
|
||||
const requestedUrl = parseTarget(rawUrl);
|
||||
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
|
||||
const closeContext = async () => {
|
||||
await context.close().catch(() => {});
|
||||
};
|
||||
|
||||
try {
|
||||
await gotoListing(page, requestedUrl);
|
||||
await dismissCommonOverlays(page);
|
||||
return await runWithOperationTimeout(
|
||||
"HAR photo extraction",
|
||||
async () => {
|
||||
await gotoListing(page, requestedUrl);
|
||||
await dismissCommonOverlays(page);
|
||||
|
||||
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
|
||||
const beforeUrl = page.url();
|
||||
const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS);
|
||||
await waitForPhotoExperience(page, beforeUrl);
|
||||
await scrollUntilSettled(page);
|
||||
await sleep(1200);
|
||||
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
|
||||
const beforeUrl = page.url();
|
||||
const clickedLabel = await clickPhotoEntryPoint(page, HAR_LABELS);
|
||||
await waitForPhotoExperience(page, beforeUrl);
|
||||
await scrollUntilSettled(page);
|
||||
await sleep(1200);
|
||||
|
||||
const candidates = await collectRenderedImageCandidates(page);
|
||||
const photos = normalizeImageCandidates(candidates, {
|
||||
hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"],
|
||||
minWidth: 240,
|
||||
minHeight: 180,
|
||||
});
|
||||
const candidates = await collectRenderedImageCandidates(page);
|
||||
const photos = normalizeImageCandidates(candidates, {
|
||||
hostIncludes: ["pics.harstatic.com", "photos.harstatic.com"],
|
||||
minWidth: 240,
|
||||
minHeight: 180,
|
||||
});
|
||||
|
||||
if (!photos.length) {
|
||||
fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view.");
|
||||
}
|
||||
if (!photos.length) {
|
||||
fail("HAR photo extraction failed.", "No large image URLs were found after opening the HAR all-photos view.");
|
||||
}
|
||||
|
||||
const result = {
|
||||
source: "har",
|
||||
requestedUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
clickedLabel,
|
||||
expectedPhotoCount,
|
||||
complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true,
|
||||
photoCount: photos.length,
|
||||
imageUrls: photos.map((photo) => photo.url),
|
||||
notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."],
|
||||
};
|
||||
|
||||
await context.close();
|
||||
return result;
|
||||
return {
|
||||
source: "har",
|
||||
requestedUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
clickedLabel,
|
||||
expectedPhotoCount,
|
||||
complete: expectedPhotoCount ? photos.length >= expectedPhotoCount : true,
|
||||
photoCount: photos.length,
|
||||
imageUrls: photos.map((photo) => photo.url),
|
||||
notes: ["Opened HAR all-photos flow and extracted large rendered image URLs from the photo page."],
|
||||
};
|
||||
},
|
||||
{
|
||||
onTimeout: closeContext
|
||||
}
|
||||
);
|
||||
} catch (error) {
|
||||
try {
|
||||
await context.close();
|
||||
} catch {
|
||||
// Ignore close errors after the primary failure.
|
||||
}
|
||||
throw new Error(error instanceof Error ? error.message : String(error));
|
||||
} finally {
|
||||
await closeContext();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -7,6 +7,7 @@ const MAX_SCROLL_PASSES = 12;
|
||||
const SCROLL_PAUSE_MS = 900;
|
||||
const LARGE_IMAGE_MIN_WIDTH = 300;
|
||||
const LARGE_IMAGE_MIN_HEIGHT = 200;
|
||||
const OPERATION_TIMEOUT_MS = Number(process.env.REAL_ESTATE_OPERATION_TIMEOUT_MS || 25000);
|
||||
|
||||
export function fail(message, details) {
|
||||
const payload = { error: message };
|
||||
@@ -38,6 +39,34 @@ export function sleep(ms) {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
export async function runWithOperationTimeout(
|
||||
operationName,
|
||||
operation,
|
||||
{ timeoutMs = OPERATION_TIMEOUT_MS, onTimeout } = {}
|
||||
) {
|
||||
let timer;
|
||||
|
||||
try {
|
||||
return await Promise.race([
|
||||
operation(),
|
||||
new Promise((_, reject) => {
|
||||
timer = setTimeout(async () => {
|
||||
try {
|
||||
await onTimeout?.();
|
||||
} catch {
|
||||
// Ignore cleanup errors; the timeout is the primary failure.
|
||||
}
|
||||
reject(new Error(`${operationName} timed out after ${timeoutMs}ms`));
|
||||
}, timeoutMs);
|
||||
}),
|
||||
]);
|
||||
} finally {
|
||||
if (timer) {
|
||||
clearTimeout(timer);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export async function loadCloakBrowser() {
|
||||
try {
|
||||
return await import("cloakbrowser");
|
||||
@@ -289,4 +318,3 @@ export function buildResult({
|
||||
notes,
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import test from "node:test";
|
||||
import assert from "node:assert/strict";
|
||||
|
||||
import { normalizeImageCandidates } from "./real-estate-photo-common.js";
|
||||
import { normalizeImageCandidates, runWithOperationTimeout } from "./real-estate-photo-common.js";
|
||||
|
||||
test("normalizeImageCandidates keeps distinct Zillow photo URLs and strips query strings", () => {
|
||||
const result = normalizeImageCandidates(
|
||||
@@ -64,3 +64,24 @@ test("normalizeImageCandidates filters tiny HAR page assets and keeps large phot
|
||||
"https://photos.har.com/123/main.jpg",
|
||||
]);
|
||||
});
|
||||
|
||||
test("runWithOperationTimeout rejects stalled work and runs timeout cleanup", async () => {
|
||||
let cleanedUp = false;
|
||||
|
||||
await assert.rejects(
|
||||
async () =>
|
||||
runWithOperationTimeout(
|
||||
"HAR photo extraction",
|
||||
async () => await new Promise(() => {}),
|
||||
{
|
||||
timeoutMs: 20,
|
||||
onTimeout: async () => {
|
||||
cleanedUp = true;
|
||||
},
|
||||
}
|
||||
),
|
||||
/timed out/i
|
||||
);
|
||||
|
||||
assert.equal(cleanedUp, true);
|
||||
});
|
||||
|
||||
@@ -7,6 +7,7 @@ import {
|
||||
dismissCommonOverlays,
|
||||
fail,
|
||||
gotoListing,
|
||||
runWithOperationTimeout,
|
||||
sleep,
|
||||
} from "./real-estate-photo-common.js";
|
||||
import {
|
||||
@@ -68,63 +69,69 @@ export async function discoverZillowListing(rawAddress) {
|
||||
const identity = parseAddressIdentity(address);
|
||||
const searchUrl = `https://www.zillow.com/homes/${encodeURIComponent(buildZillowAddressSlug(address))}_rb/`;
|
||||
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
|
||||
const closeContext = async () => {
|
||||
await context.close().catch(() => {});
|
||||
};
|
||||
|
||||
try {
|
||||
const attempts = [`Opened Zillow address search URL: ${searchUrl}`];
|
||||
await gotoListing(page, searchUrl, 2500);
|
||||
await dismissCommonOverlays(page);
|
||||
await sleep(1500);
|
||||
return await runWithOperationTimeout(
|
||||
"Zillow discovery",
|
||||
async () => {
|
||||
const attempts = [`Opened Zillow address search URL: ${searchUrl}`];
|
||||
await gotoListing(page, searchUrl, 2500);
|
||||
await dismissCommonOverlays(page);
|
||||
await sleep(1500);
|
||||
|
||||
let listingUrl = null;
|
||||
if (page.url().includes("/homedetails/")) {
|
||||
const directScore = scoreAddressCandidate(
|
||||
identity,
|
||||
`${page.url()} ${(await page.title()) || ""}`
|
||||
);
|
||||
if (directScore.matched) {
|
||||
listingUrl = normalizeListingUrl(page.url());
|
||||
attempts.push("Zillow search URL resolved directly to a matching property page.");
|
||||
} else {
|
||||
attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough.");
|
||||
}
|
||||
} else {
|
||||
const discovered = await collectListingUrl(page);
|
||||
const scored = discovered
|
||||
.map((candidate) => ({
|
||||
...candidate,
|
||||
match: scoreAddressCandidate(
|
||||
let listingUrl = null;
|
||||
if (page.url().includes("/homedetails/")) {
|
||||
const directScore = scoreAddressCandidate(
|
||||
identity,
|
||||
`${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}`
|
||||
)
|
||||
}))
|
||||
.sort((a, b) => b.match.score - a.match.score);
|
||||
`${page.url()} ${(await page.title()) || ""}`
|
||||
);
|
||||
if (directScore.matched) {
|
||||
listingUrl = normalizeListingUrl(page.url());
|
||||
attempts.push("Zillow search URL resolved directly to a matching property page.");
|
||||
} else {
|
||||
attempts.push("Zillow redirected to a property page, but it did not match the requested address closely enough.");
|
||||
}
|
||||
} else {
|
||||
const discovered = await collectListingUrl(page);
|
||||
const scored = discovered
|
||||
.map((candidate) => ({
|
||||
...candidate,
|
||||
match: scoreAddressCandidate(
|
||||
identity,
|
||||
`${candidate.url} ${candidate.text} ${candidate.aria} ${candidate.title} ${candidate.parentText}`
|
||||
)
|
||||
}))
|
||||
.sort((a, b) => b.match.score - a.match.score);
|
||||
|
||||
if (scored[0]?.match.matched) {
|
||||
listingUrl = normalizeListingUrl(scored[0].url);
|
||||
attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`);
|
||||
} else {
|
||||
attempts.push("Zillow discovery did not expose a confident homedetails match for this address.");
|
||||
if (scored[0]?.match.matched) {
|
||||
listingUrl = normalizeListingUrl(scored[0].url);
|
||||
attempts.push(`Zillow search results exposed a matching homedetails link with score ${scored[0].match.score}.`);
|
||||
} else {
|
||||
attempts.push("Zillow discovery did not expose a confident homedetails match for this address.");
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
source: "zillow",
|
||||
address,
|
||||
searchUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
listingUrl,
|
||||
attempts,
|
||||
};
|
||||
},
|
||||
{
|
||||
onTimeout: closeContext
|
||||
}
|
||||
}
|
||||
|
||||
const result = {
|
||||
source: "zillow",
|
||||
address,
|
||||
searchUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
listingUrl,
|
||||
attempts,
|
||||
};
|
||||
await context.close();
|
||||
return result;
|
||||
);
|
||||
} catch (error) {
|
||||
try {
|
||||
await context.close();
|
||||
} catch {
|
||||
// Ignore close errors after the primary failure.
|
||||
}
|
||||
throw new Error(`Zillow discovery failed: ${error instanceof Error ? error.message : String(error)}`);
|
||||
} finally {
|
||||
await closeContext();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -10,6 +10,7 @@ import {
|
||||
gotoListing,
|
||||
normalizeImageCandidates,
|
||||
parseTarget,
|
||||
runWithOperationTimeout,
|
||||
scrollUntilSettled,
|
||||
sleep,
|
||||
waitForPhotoExperience,
|
||||
@@ -107,77 +108,82 @@ async function collectZillowStructuredPhotoCandidates(page) {
|
||||
export async function extractZillowPhotos(rawUrl) {
|
||||
const requestedUrl = parseTarget(rawUrl);
|
||||
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
|
||||
const closeContext = async () => {
|
||||
await context.close().catch(() => {});
|
||||
};
|
||||
|
||||
try {
|
||||
await gotoListing(page, requestedUrl);
|
||||
await dismissCommonOverlays(page);
|
||||
return await runWithOperationTimeout(
|
||||
"Zillow photo extraction",
|
||||
async () => {
|
||||
await gotoListing(page, requestedUrl);
|
||||
await dismissCommonOverlays(page);
|
||||
|
||||
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
|
||||
const beforeUrl = page.url();
|
||||
let clickedLabel = null;
|
||||
let clickError = null;
|
||||
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
|
||||
const beforeUrl = page.url();
|
||||
let clickedLabel = null;
|
||||
let clickError = null;
|
||||
|
||||
try {
|
||||
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
|
||||
await waitForPhotoExperience(page, beforeUrl);
|
||||
await scrollUntilSettled(page);
|
||||
await sleep(1200);
|
||||
} catch (error) {
|
||||
clickError = error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
try {
|
||||
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
|
||||
await waitForPhotoExperience(page, beforeUrl);
|
||||
await scrollUntilSettled(page);
|
||||
await sleep(1200);
|
||||
} catch (error) {
|
||||
clickError = error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
const [structuredCandidates, renderedCandidates] = await Promise.all([
|
||||
collectZillowStructuredPhotoCandidates(page),
|
||||
collectZillowPhotoCandidates(page),
|
||||
]);
|
||||
const candidates = [...structuredCandidates, ...renderedCandidates];
|
||||
const normalized = normalizeImageCandidates(candidates, {
|
||||
hostIncludes: ["photos.zillowstatic.com"],
|
||||
minWidth: 240,
|
||||
minHeight: 180,
|
||||
});
|
||||
const photos = collapseZillowPhotos(normalized);
|
||||
const [structuredCandidates, renderedCandidates] = await Promise.all([
|
||||
collectZillowStructuredPhotoCandidates(page),
|
||||
collectZillowPhotoCandidates(page),
|
||||
]);
|
||||
const candidates = [...structuredCandidates, ...renderedCandidates];
|
||||
const normalized = normalizeImageCandidates(candidates, {
|
||||
hostIncludes: ["photos.zillowstatic.com"],
|
||||
minWidth: 240,
|
||||
minHeight: 180,
|
||||
});
|
||||
const photos = collapseZillowPhotos(normalized);
|
||||
|
||||
if (!photos.length) {
|
||||
fail(
|
||||
"Zillow photo extraction failed.",
|
||||
clickError || "No Zillow image URLs were found on the rendered listing page."
|
||||
);
|
||||
}
|
||||
if (!photos.length) {
|
||||
fail(
|
||||
"Zillow photo extraction failed.",
|
||||
clickError || "No Zillow image URLs were found on the rendered listing page."
|
||||
);
|
||||
}
|
||||
|
||||
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
|
||||
const notes = [];
|
||||
if (clickedLabel) {
|
||||
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
|
||||
} else {
|
||||
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
|
||||
}
|
||||
if (clickError) {
|
||||
notes.push(`All-photos click path was not required: ${clickError}`);
|
||||
}
|
||||
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
|
||||
const notes = [];
|
||||
if (clickedLabel) {
|
||||
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
|
||||
} else {
|
||||
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
|
||||
}
|
||||
if (clickError) {
|
||||
notes.push(`All-photos click path was not required: ${clickError}`);
|
||||
}
|
||||
|
||||
const result = {
|
||||
source: "zillow",
|
||||
requestedUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
clickedLabel,
|
||||
expectedPhotoCount,
|
||||
complete,
|
||||
photoCount: photos.length,
|
||||
imageUrls: photos.map((photo) => photo.url),
|
||||
notes,
|
||||
};
|
||||
|
||||
await context.close();
|
||||
return result;
|
||||
return {
|
||||
source: "zillow",
|
||||
requestedUrl,
|
||||
finalUrl: page.url(),
|
||||
title: await page.title(),
|
||||
clickedLabel,
|
||||
expectedPhotoCount,
|
||||
complete,
|
||||
photoCount: photos.length,
|
||||
imageUrls: photos.map((photo) => photo.url),
|
||||
notes,
|
||||
};
|
||||
},
|
||||
{
|
||||
onTimeout: closeContext
|
||||
}
|
||||
);
|
||||
} catch (error) {
|
||||
try {
|
||||
await context.close();
|
||||
} catch {
|
||||
// Ignore close errors after the primary failure.
|
||||
}
|
||||
throw new Error(error instanceof Error ? error.message : String(error));
|
||||
} finally {
|
||||
await closeContext();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user