feat(amazon-shopping): scaffold amazon product search skill

This commit is contained in:
2026-04-15 18:24:13 -05:00
parent 26a968797c
commit 8ad532545d
14 changed files with 1234 additions and 0 deletions
@@ -0,0 +1,47 @@
# Amazon Data Map
Use this reference when deciding which visible Amazon fields can be reported by `amazon-shopping`.
## Product Search Fields
Search result cards should be treated as candidates, not final truth. Prefer cards with a non-empty `data-asin` value. Extract only visible data from the rendered search page:
| Output field | Search-page source | Notes |
|---|---|---|
| `asin` | `data-asin` on result card | Required for normalized detail links. |
| `title` | product heading or product link text | Trim sponsored/accessibility boilerplate. |
| `url` | product link | Normalize to `https://www.amazon.com/dp/<ASIN>` when safe. |
| `imageUrl` | visible product image `src` | Optional. |
| `price` | visible `.a-price` text | Do not infer absent prices from snippets. |
| `rating` | `aria-label` or visible text like `4.6 out of 5 stars` | Optional. |
| `reviewCount` | text near rating like `1,234 ratings` | Optional. |
| `delivery.display` | visible delivery promise text | Optional and ZIP/session dependent. |
| `isSponsored` | visible sponsored marker | Sponsored results may be included but must be labeled. |
## Detail Page Fields
Open only normalized product detail URLs under `/dp/<ASIN>` or `/gp/product/<ASIN>`. Extract visible fields:
| Output field | Detail-page source | Notes |
|---|---|---|
| `title` | `#productTitle` or equivalent heading | Detail title can replace search title. |
| `price` | buy-box/current price selectors | Variant pages can omit price. |
| `delivery` | delivery message near buy box | Report as text, not guaranteed. |
| `availability` | availability block | Optional. |
| `seller` | seller/ships-from visible text | Optional. |
| `bullets` | feature bullets list | Trim empty and hidden items. |
| `specs` | product overview/details/technical tables | Preserve name/value pairs. |
| `starBreakdown` | visible customer-review histogram | Percent or count basis only. Do not crawl review pages. |
## Filter Semantics
- `over 200 reviews` means `reviewCount > 200`.
- `at least 200 reviews` means `reviewCount >= 200`.
- `more than 4.5 stars` means `rating > 4.5`.
- `4.5 stars or better` means `rating >= 4.5`.
- `less than $4 each` means visible unit price first, then high-confidence unit-count inference. Unknown unit prices do not pass strict unit-price filters.
- Missing fields must be represented as `null` or noted in `missingFields` / `extractionNotes`; never fabricate values.
## Official Alternatives
Amazon Business Product Search API and Product Advertising API are official API paths for structured product data when the operator has credentials. This skill uses bounded web automation because the current install request requires `web-automation` scraping.
@@ -0,0 +1,39 @@
# Compliance And Failure Modes
This reference is operational guidance, not legal advice. The operator is responsible for making sure a run complies with Amazon terms, robots directives, local law, and account obligations.
## Required Guardrails
- Fetch and evaluate `https://www.amazon.com/robots.txt` before live scraping planned Amazon paths.
- Stop if the effective rules disallow the planned search or detail paths.
- Do not automate sign-in, checkout, cart, wishlist, review submission, customer-review pages, reviewer profiles, or any disallowed path.
- Do not bypass CAPTCHA, bot checks, blocked pages, or access-denied pages.
- Do not print cookies, profile state, session storage, or account/location-specific browser data.
## Allowed Scope
Allowed behavior is bounded read-only product research over search result pages and normalized product detail pages:
- `/s?k=<query>` search results.
- `/dp/<ASIN>` product details.
- `/gp/product/<ASIN>` product details.
Review data is limited to visible summary ratings/counts and visible histogram rows on search/detail pages. Do not navigate to `/product-reviews`, `/review`, `/gp/customer-reviews`, or review AJAX endpoints.
## Failure Modes
Return a structured warning and do not claim success when any of these happen:
- CAPTCHA or bot-check page.
- Sign-in wall.
- HTTP 429 or 503 that remains after the bounded retry budget.
- Robots rules disallow a planned path.
- Product markup changes enough that required fields cannot be found.
- Amazon returns localized, personalized, or ZIP/session-dependent delivery text that cannot be verified.
## Output Rules
- Unknown fields stay unknown.
- Partial extraction is acceptable only when the response includes warnings and missing-field notes.
- Sponsored products can be returned by default but must be labeled.
- Counts above 30 require operator confirmation or batch splitting.
@@ -0,0 +1,27 @@
# Web-Automation Prompts
Use these patterns when debugging or extending the `amazon-shopping` browser workflow. The TypeScript helper is the default interface; these prompts document the intended rendered-page behavior.
## Search Page
```text
Use the installed web-automation skill. Open https://www.amazon.com/s?k=<encoded query>. Wait for rendered search results. If a CAPTCHA, bot check, sign-in wall, or access denied page appears, stop and return a challenge status. Otherwise extract visible product cards with ASIN, title, product URL, displayed price, rating text, review count text, delivery text, sponsor marker, and image URL. Do not open cart, sign-in, wishlist, or review-listing pages.
```
## Detail Page
```text
For each candidate product detail URL under /dp/<ASIN> or /gp/product/<ASIN>, open the page slowly one at a time. Extract title, canonical link, visible buy-box/current price, delivery summary, availability, seller when visible, feature bullets, product details/specification table rows, rating score, review count, and visible customer-review histogram percentages if present on the detail page. Do not navigate to /product-reviews or reviewer profile pages.
```
## Pagination
```text
Follow only the visible Amazon pagination control for the next search page, or construct page=<n> only after the current page exposes normal search results and no challenge/block. Stop when enough candidates have been collected, no next page exists, a challenge appears, or maxSearchPages is reached.
```
## Robustness Notes
- Prefer Playwright locator/actionability behavior and bounded waits over fixed sleeps.
- Never follow sponsored redirect URLs, sign-in links, cart links, wishlist links, or review-page links.
- Return partial results with warnings when Amazon markup changes or fields are hidden.