feat(amazon-shopping): scaffold amazon product search skill
This commit is contained in:
@@ -0,0 +1,47 @@
|
||||
# Amazon Data Map
|
||||
|
||||
Use this reference when deciding which visible Amazon fields can be reported by `amazon-shopping`.
|
||||
|
||||
## Product Search Fields
|
||||
|
||||
Search result cards should be treated as candidates, not final truth. Prefer cards with a non-empty `data-asin` value. Extract only visible data from the rendered search page:
|
||||
|
||||
| Output field | Search-page source | Notes |
|
||||
|---|---|---|
|
||||
| `asin` | `data-asin` on result card | Required for normalized detail links. |
|
||||
| `title` | product heading or product link text | Trim sponsored/accessibility boilerplate. |
|
||||
| `url` | product link | Normalize to `https://www.amazon.com/dp/<ASIN>` when safe. |
|
||||
| `imageUrl` | visible product image `src` | Optional. |
|
||||
| `price` | visible `.a-price` text | Do not infer absent prices from snippets. |
|
||||
| `rating` | `aria-label` or visible text like `4.6 out of 5 stars` | Optional. |
|
||||
| `reviewCount` | text near rating like `1,234 ratings` | Optional. |
|
||||
| `delivery.display` | visible delivery promise text | Optional and ZIP/session dependent. |
|
||||
| `isSponsored` | visible sponsored marker | Sponsored results may be included but must be labeled. |
|
||||
|
||||
## Detail Page Fields
|
||||
|
||||
Open only normalized product detail URLs under `/dp/<ASIN>` or `/gp/product/<ASIN>`. Extract visible fields:
|
||||
|
||||
| Output field | Detail-page source | Notes |
|
||||
|---|---|---|
|
||||
| `title` | `#productTitle` or equivalent heading | Detail title can replace search title. |
|
||||
| `price` | buy-box/current price selectors | Variant pages can omit price. |
|
||||
| `delivery` | delivery message near buy box | Report as text, not guaranteed. |
|
||||
| `availability` | availability block | Optional. |
|
||||
| `seller` | seller/ships-from visible text | Optional. |
|
||||
| `bullets` | feature bullets list | Trim empty and hidden items. |
|
||||
| `specs` | product overview/details/technical tables | Preserve name/value pairs. |
|
||||
| `starBreakdown` | visible customer-review histogram | Percent or count basis only. Do not crawl review pages. |
|
||||
|
||||
## Filter Semantics
|
||||
|
||||
- `over 200 reviews` means `reviewCount > 200`.
|
||||
- `at least 200 reviews` means `reviewCount >= 200`.
|
||||
- `more than 4.5 stars` means `rating > 4.5`.
|
||||
- `4.5 stars or better` means `rating >= 4.5`.
|
||||
- `less than $4 each` means visible unit price first, then high-confidence unit-count inference. Unknown unit prices do not pass strict unit-price filters.
|
||||
- Missing fields must be represented as `null` or noted in `missingFields` / `extractionNotes`; never fabricate values.
|
||||
|
||||
## Official Alternatives
|
||||
|
||||
Amazon Business Product Search API and Product Advertising API are official API paths for structured product data when the operator has credentials. This skill uses bounded web automation because the current install request requires `web-automation` scraping.
|
||||
@@ -0,0 +1,39 @@
|
||||
# Compliance And Failure Modes
|
||||
|
||||
This reference is operational guidance, not legal advice. The operator is responsible for making sure a run complies with Amazon terms, robots directives, local law, and account obligations.
|
||||
|
||||
## Required Guardrails
|
||||
|
||||
- Fetch and evaluate `https://www.amazon.com/robots.txt` before live scraping planned Amazon paths.
|
||||
- Stop if the effective rules disallow the planned search or detail paths.
|
||||
- Do not automate sign-in, checkout, cart, wishlist, review submission, customer-review pages, reviewer profiles, or any disallowed path.
|
||||
- Do not bypass CAPTCHA, bot checks, blocked pages, or access-denied pages.
|
||||
- Do not print cookies, profile state, session storage, or account/location-specific browser data.
|
||||
|
||||
## Allowed Scope
|
||||
|
||||
Allowed behavior is bounded read-only product research over search result pages and normalized product detail pages:
|
||||
|
||||
- `/s?k=<query>` search results.
|
||||
- `/dp/<ASIN>` product details.
|
||||
- `/gp/product/<ASIN>` product details.
|
||||
|
||||
Review data is limited to visible summary ratings/counts and visible histogram rows on search/detail pages. Do not navigate to `/product-reviews`, `/review`, `/gp/customer-reviews`, or review AJAX endpoints.
|
||||
|
||||
## Failure Modes
|
||||
|
||||
Return a structured warning and do not claim success when any of these happen:
|
||||
|
||||
- CAPTCHA or bot-check page.
|
||||
- Sign-in wall.
|
||||
- HTTP 429 or 503 that remains after the bounded retry budget.
|
||||
- Robots rules disallow a planned path.
|
||||
- Product markup changes enough that required fields cannot be found.
|
||||
- Amazon returns localized, personalized, or ZIP/session-dependent delivery text that cannot be verified.
|
||||
|
||||
## Output Rules
|
||||
|
||||
- Unknown fields stay unknown.
|
||||
- Partial extraction is acceptable only when the response includes warnings and missing-field notes.
|
||||
- Sponsored products can be returned by default but must be labeled.
|
||||
- Counts above 30 require operator confirmation or batch splitting.
|
||||
@@ -0,0 +1,27 @@
|
||||
# Web-Automation Prompts
|
||||
|
||||
Use these patterns when debugging or extending the `amazon-shopping` browser workflow. The TypeScript helper is the default interface; these prompts document the intended rendered-page behavior.
|
||||
|
||||
## Search Page
|
||||
|
||||
```text
|
||||
Use the installed web-automation skill. Open https://www.amazon.com/s?k=<encoded query>. Wait for rendered search results. If a CAPTCHA, bot check, sign-in wall, or access denied page appears, stop and return a challenge status. Otherwise extract visible product cards with ASIN, title, product URL, displayed price, rating text, review count text, delivery text, sponsor marker, and image URL. Do not open cart, sign-in, wishlist, or review-listing pages.
|
||||
```
|
||||
|
||||
## Detail Page
|
||||
|
||||
```text
|
||||
For each candidate product detail URL under /dp/<ASIN> or /gp/product/<ASIN>, open the page slowly one at a time. Extract title, canonical link, visible buy-box/current price, delivery summary, availability, seller when visible, feature bullets, product details/specification table rows, rating score, review count, and visible customer-review histogram percentages if present on the detail page. Do not navigate to /product-reviews or reviewer profile pages.
|
||||
```
|
||||
|
||||
## Pagination
|
||||
|
||||
```text
|
||||
Follow only the visible Amazon pagination control for the next search page, or construct page=<n> only after the current page exposes normal search results and no challenge/block. Stop when enough candidates have been collected, no next page exists, a challenge appears, or maxSearchPages is reached.
|
||||
```
|
||||
|
||||
## Robustness Notes
|
||||
|
||||
- Prefer Playwright locator/actionability behavior and bounded waits over fixed sleeps.
|
||||
- Never follow sponsored redirect URLs, sign-in links, cart links, wishlist links, or review-page links.
|
||||
- Return partial results with warnings when Amazon markup changes or fields are hidden.
|
||||
Reference in New Issue
Block a user