Harden assessor fallback after Zillow photo failure

This commit is contained in:
2026-03-28 03:17:51 -05:00
parent 3335e96d35
commit 54854edfc6
5 changed files with 115 additions and 74 deletions

View File

@@ -76,6 +76,7 @@ Operational rule:
- In WhatsApp or similar messaging runs, keep the core analysis on `web-automation` plus `web_fetch`. Treat `web_search` as a narrow fallback for alternate-URL discovery only, not the default path for Zillow/HAR/CAD work.
- Do not start Zillow/HAR property discovery or photo review from Brave-backed `web_search` when `web-automation` can open the candidate listing source directly.
- For CAD/public-record lookup, prefer official assessor/CAD pages via `web_fetch` first and `web-automation` second if the site needs rendered interaction.
- In Texas runs, do not use `https://www.texas.gov/propertytaxes/search/` as the CAD lookup path; use the address-first CAD/helper path or the discovered county CAD pages directly.
- In those messaging runs, reserve subprocess use for a single final `render-report` attempt after the verdict and fair-value range are complete.
- In those messaging runs, do not start Gmail/email-send skill discovery or delivery tooling until the report content is complete and the PDF is ready to render or already rendered.
- Property-assessor delivery emails should be sent as Luke from Luke's Google Workspace account, while still delivering to the user-specified destination.
@@ -83,6 +84,8 @@ Operational rule:
- Do not route property-assessor delivery through generic `gog` or the Stefano helper `node ~/.openclaw/workspace/integrations/google-workspace/gw.js`.
- If the agent needs to confirm Luke auth before sending, use `zsh ~/.openclaw/workspace/bin/gog-luke auth list --check --plain`.
- Treat a silent helper as a failed helper in messaging runs. If a helper produces no useful output within a short bound, abandon it and continue with the chat-native path instead of repeatedly polling it.
- If Zillow photo extraction fails, immediately continue with HAR photo fallback or the next available rendered listing/photo source rather than stopping the assessment.
- After a Zillow/HAR photo miss, continue the comp and CAD/public-record work in the same run. A photo-source miss is a fallback event, not a terminal state.
- If the original request already authorized sending the finished PDF to a stated email address, do not pause for a redundant send-confirmation prompt after rendering.
- If final PDF render/send fails, return the completed decision-grade report in chat and report delivery failure separately rather than restarting the whole assessment.

View File

@@ -60,6 +60,7 @@ Rules:
- In WhatsApp or similar messaging runs, keep the core assessment on `web-automation` plus `web_fetch`. Treat `web_search` as a fallback discovery aid, not the primary property-analysis path.
- For Zillow/HAR property discovery, photo extraction, and rendered listing review, do **not** start with Brave-backed `web_search` when `web-automation` can open the candidate source directly.
- For CAD/public-record enrichment, prefer official assessor/CAD pages via `web_fetch` first. Use `web-automation` when the official site needs rendered interaction. Do **not** start CAD lookup from generic web-search snippets when the official site is already known or derivable from the address.
- In Texas runs, do **not** use `https://www.texas.gov/propertytaxes/search/` as the CAD lookup path. Use the address-first CAD/helper path or the discovered county CAD pages directly.
- In those messaging runs, do **not** make `scripts/property-assessor assess`, `scripts/property-assessor locate-public-records`, `node zillow-discover.js`, `node har-discover.js`, `node zillow-photos.js`, `node har-photos.js`, `curl`, or `wget` the default core-analysis path.
- From messaging runs, the only subprocess-style step you should attempt by default is the final `scripts/property-assessor render-report` call after the verdict, fair-value range, and report body are complete.
- Do **not** inspect Gmail/email-send skills, mail tooling, or delivery integrations until the assessment is complete and the PDF is either already rendered or ready to render immediately.
@@ -68,6 +69,8 @@ Rules:
- Do **not** use generic `gog` or the Stefano helper `node ~/.openclaw/workspace/integrations/google-workspace/gw.js` for property-assessor report delivery.
- If you need to confirm Luke auth before sending, use `zsh ~/.openclaw/workspace/bin/gog-luke auth list --check --plain`.
- A silent helper is a failed helper in messaging runs. If a background helper produces no useful stdout/stderr and no result within a short bound, stop polling it, treat that path as failed, and continue on the chat-native assessment path instead of narrating that it is still chewing.
- If Zillow photo extraction fails, immediately continue with HAR photo fallback or the next available rendered listing/photo source. Do **not** stop the assessment because one Zillow photo subprocess failed.
- After a Zillow/HAR photo miss, continue the comp and CAD/public-record work in the same run. A photo-source miss is a fallback event, not a terminal state.
- Do **not** leave the user parked behind background helper polling. If a helper has not produced a result quickly, give a concise status update and continue the assessment with the next available non-helper path.
- If the user already instructed you to email/send the finished PDF to a specific target, do **not** ask for a second send confirmation after rendering. Render, send, and report the result.
- If the final PDF render fails, return the complete decision-grade report in chat and say the render/send step failed. Do not restart the whole assessment.

View File

@@ -39,6 +39,11 @@ export function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
export function isPageClosedError(error) {
const message = error instanceof Error ? error.message : String(error || "");
return /Target page, context or browser has been closed|Execution context was destroyed/i.test(message);
}
export async function runWithOperationTimeout(
operationName,
operation,

View File

@@ -1,7 +1,11 @@
import test from "node:test";
import assert from "node:assert/strict";
import { normalizeImageCandidates, runWithOperationTimeout } from "./real-estate-photo-common.js";
import {
isPageClosedError,
normalizeImageCandidates,
runWithOperationTimeout,
} from "./real-estate-photo-common.js";
test("normalizeImageCandidates keeps distinct Zillow photo URLs and strips query strings", () => {
const result = normalizeImageCandidates(
@@ -85,3 +89,15 @@ test("runWithOperationTimeout rejects stalled work and runs timeout cleanup", as
assert.equal(cleanedUp, true);
});
test("isPageClosedError detects transient browser-session closure errors", () => {
assert.equal(
isPageClosedError(new Error("page.evaluate: Target page, context or browser has been closed")),
true
);
assert.equal(
isPageClosedError(new Error("Execution context was destroyed, most likely because of a navigation")),
true
);
assert.equal(isPageClosedError(new Error("No Zillow image URLs were found")), false);
});

View File

@@ -8,6 +8,7 @@ import {
dismissCommonOverlays,
fail,
gotoListing,
isPageClosedError,
normalizeImageCandidates,
parseTarget,
runWithOperationTimeout,
@@ -107,85 +108,98 @@ async function collectZillowStructuredPhotoCandidates(page) {
export async function extractZillowPhotos(rawUrl, options = {}) {
const requestedUrl = parseTarget(rawUrl);
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
const maxAttempts = 2;
let lastError = null;
try {
return await runWithOperationTimeout(
"Zillow photo extraction",
async () => {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
const { context, page } = await createPageSession({ headless: process.env.HEADLESS !== "false" });
const closeContext = async () => {
await context.close().catch(() => {});
};
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
let clickedLabel = null;
let clickError = null;
try {
return await runWithOperationTimeout(
"Zillow photo extraction",
async () => {
await gotoListing(page, requestedUrl);
await dismissCommonOverlays(page);
try {
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
} catch (error) {
clickError = error instanceof Error ? error.message : String(error);
const expectedPhotoCount = await getAnnouncedPhotoCount(page);
const beforeUrl = page.url();
let clickedLabel = null;
let clickError = null;
try {
clickedLabel = await clickPhotoEntryPoint(page, ZILLOW_LABELS);
await waitForPhotoExperience(page, beforeUrl);
await scrollUntilSettled(page);
await sleep(1200);
} catch (error) {
clickError = error instanceof Error ? error.message : String(error);
}
const [structuredCandidates, renderedCandidates] = await Promise.all([
collectZillowStructuredPhotoCandidates(page),
collectZillowPhotoCandidates(page),
]);
const candidates = [...structuredCandidates, ...renderedCandidates];
const normalized = normalizeImageCandidates(candidates, {
hostIncludes: ["photos.zillowstatic.com"],
minWidth: 240,
minHeight: 180,
});
const photos = collapseZillowPhotos(normalized);
if (!photos.length) {
fail(
"Zillow photo extraction failed.",
clickError || "No Zillow image URLs were found on the rendered listing page."
);
}
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
const notes = [];
if (clickedLabel) {
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
} else {
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
}
if (clickError) {
notes.push(`All-photos click path was not required: ${clickError}`);
}
if (attempt > 1) {
notes.push(`Recovered after retrying Zillow photo extraction once because the first browser session closed unexpectedly.`);
}
return {
source: "zillow",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes,
};
},
{
timeoutMs: Number(options.timeoutMs || 0) || undefined,
onTimeout: closeContext
}
const [structuredCandidates, renderedCandidates] = await Promise.all([
collectZillowStructuredPhotoCandidates(page),
collectZillowPhotoCandidates(page),
]);
const candidates = [...structuredCandidates, ...renderedCandidates];
const normalized = normalizeImageCandidates(candidates, {
hostIncludes: ["photos.zillowstatic.com"],
minWidth: 240,
minHeight: 180,
});
const photos = collapseZillowPhotos(normalized);
if (!photos.length) {
fail(
"Zillow photo extraction failed.",
clickError || "No Zillow image URLs were found on the rendered listing page."
);
}
const complete = expectedPhotoCount ? photos.length >= expectedPhotoCount : true;
const notes = [];
if (clickedLabel) {
notes.push("Opened Zillow all-photos flow and extracted direct Zillow image URLs.");
} else {
notes.push("The rendered Zillow listing shell already exposed the Zillow photo stream, so extraction completed without relying on the all-photos click path.");
}
if (clickError) {
notes.push(`All-photos click path was not required: ${clickError}`);
}
return {
source: "zillow",
requestedUrl,
finalUrl: page.url(),
title: await page.title(),
clickedLabel,
expectedPhotoCount,
complete,
photoCount: photos.length,
imageUrls: photos.map((photo) => photo.url),
notes,
};
},
{
timeoutMs: Number(options.timeoutMs || 0) || undefined,
onTimeout: closeContext
);
} catch (error) {
lastError = error;
if (!(attempt < maxAttempts && isPageClosedError(error))) {
throw new Error(error instanceof Error ? error.message : String(error));
}
);
} catch (error) {
throw new Error(error instanceof Error ? error.message : String(error));
} finally {
await closeContext();
} finally {
await closeContext();
}
}
throw new Error(lastError instanceof Error ? lastError.message : String(lastError));
}
async function main() {