53 lines
3.4 KiB
Markdown
53 lines
3.4 KiB
Markdown
# Amazon Data Map
|
|
|
|
Use this reference when deciding which visible Amazon fields can be reported by `amazon-shopping`.
|
|
|
|
## Product Search Fields
|
|
|
|
Search result cards should be treated as candidates, not final truth. Prefer cards with a non-empty `data-asin` value. Extract only visible data from the rendered search page:
|
|
|
|
| Output field | Search-page source | Notes |
|
|
|---|---|---|
|
|
| `asin` | `data-asin` on result card | Required for normalized detail links. |
|
|
| `title` | product heading or product link text | Trim sponsored/accessibility boilerplate. |
|
|
| `url` | product link | Normalize to `https://www.amazon.com/dp/<ASIN>` when safe. |
|
|
| `imageUrl` | visible product image `src` | Optional. |
|
|
| `price` | visible `.a-price` text | Do not infer absent prices from snippets. |
|
|
| `rating` | `aria-label` or visible text like `4.6 out of 5 stars` | Optional. |
|
|
| `reviewCount` | text near rating like `1,234 ratings` | Optional. |
|
|
| `delivery.display` | visible delivery promise text | Optional and ZIP/session dependent. |
|
|
| `delivery.prime` | visible Prime badge, Prime icon class, `aria-label`, `alt`, or delivery text | Optional and ZIP/session dependent. Preserve a true search-card Prime signal when detail text omits the literal word Prime. |
|
|
| `isSponsored` | visible sponsored marker | Sponsored results may be included but must be labeled. |
|
|
|
|
## Detail Page Fields
|
|
|
|
Open only normalized product detail URLs under `/dp/<ASIN>` or `/gp/product/<ASIN>`. Extract visible fields:
|
|
|
|
| Output field | Detail-page source | Notes |
|
|
|---|---|---|
|
|
| `title` | `#productTitle` or equivalent heading | Detail title can replace search title. |
|
|
| `price` | buy-box/current price selectors | Variant pages can omit price. |
|
|
| `delivery` | delivery message near buy box | Report as text, not guaranteed. |
|
|
| `availability` | availability block | Optional. |
|
|
| `seller` | seller/ships-from visible text | Optional. |
|
|
| `bullets` | feature bullets list | Trim empty and hidden items. |
|
|
| `specs` | product overview/details/technical tables | Preserve name/value pairs. |
|
|
| `starBreakdown` | visible customer-review histogram | Percent or count basis only. Do not crawl review pages. |
|
|
|
|
## Filter Semantics
|
|
|
|
- `over 200 reviews` means `reviewCount > 200`.
|
|
- `at least 200 reviews` means `reviewCount >= 200`.
|
|
- `more than 4.5 stars` means `rating > 4.5`.
|
|
- `4.5 stars or better` means `rating >= 4.5`.
|
|
- `less than $4 each` means visible unit price first, then high-confidence unit-count inference. Unknown unit prices do not pass strict unit-price filters.
|
|
- `77 inches or wider` means the overall product width must be `>= 77` inches. Prefer product/item dimensions with an explicit `W` component; ignore seat, arm, door, package, and cushion widths.
|
|
- `shipped with Prime` / `Prime shipping` means a visible Prime signal must be detected on the search card or detail page.
|
|
- `delivery by tomorrow` and `overnight shipping` require visible delivery text that indicates tomorrow, overnight, next-day, or one-day delivery.
|
|
- `top 10 by price` sorts passing products by displayed product price ascending.
|
|
- Missing fields must be represented as `null` or noted in `missingFields` / `extractionNotes`; never fabricate values.
|
|
|
|
## Official Alternatives
|
|
|
|
Amazon Business Product Search API and Product Advertising API are official API paths for structured product data when the operator has credentials. This skill uses bounded web automation because the current install request requires `web-automation` scraping.
|