How Actowiz Solutions extracted ZOZOTOWN's full fashion catalog at the color/size variant level — with deterministic hash-based ASINs and bilingual content fidelity — delivered as a clean, schema-consistent Excel/CSV dataset.
A fashion-retail analytics client needing a complete, variant-level product catalog from ZOZOTOWN — one of Japan's largest online fashion platforms — for assortment and pricing benchmarking.
The client required a complete, analysis-ready fashion catalog dataset from ZOZOTOWN — one of Japan's largest online fashion retail platforms — to support catalog analysis, assortment evaluation, and competitive benchmarking in the Japanese fashion retail sector.
Actowiz Solutions executed a once-off, full-catalog extraction across all available categories, capturing not just core product details but every available color and size variation for each item. Two structural requirements were central: complete variant-level coverage — where every color/size combination is captured as its own structured record — and a unique hash-based ASIN generated from Goods ID, color, and size to reliably identify each variant downstream.
Data was captured in both Japanese and English exactly as displayed on the platform, with no translation applied, preserving source-language fidelity. The final dataset was delivered in Excel/CSV — clean, deduplicated, and structured for immediate analytical use.
Actowiz designed a full-catalog extraction pipeline built around three core capabilities — complete category traversal, row-wise variant expansion, and hash-based variant identification — delivered as a clean, structured Excel/CSV dataset.
For every product page, the pipeline captured all available color and size combinations and represented each as a separate, fully-attributed row. This row-wise expansion enables analysis at the true SKU/variant level rather than an aggregated product level — essential for accurate assortment and pricing analysis in fashion. The Goods ID (used as PID) stays consistent across all variant rows of the same parent product, so variants can be grouped back to their parent whenever needed.
To uniquely identify each variant, a hash-based ASIN is generated by combining the product's Goods ID with its specific Color and Size values. This produces a deterministic, unique identifier for every color/size combination — the same variant always resolves to the same ASIN — supporting consistent variant tracking without depending on any platform-native variant ID.
Names, descriptions, and other text were extracted exactly as displayed — Japanese, English, or a mix — depending on what the platform presented for each product. No translation layer was applied at any stage, preserving full fidelity while still passing through the text-cleaning layer that removes HTML tags and junk characters.
MRP, sale price, and final price were cross-checked for logical consistency — sale price validated as ≤ MRP, with discount calculated as the numeric difference between MRP and final price. All prices were standardized to Japanese Yen (JPY), and boolean fields such as IsOnSale and IsInStock normalized to consistent TRUE/FALSE values.
The final Excel/CSV dataset contains standardized attributes per variant row, covering product identity, descriptive content, category hierarchy, pricing, variant detail, and rating information.
| # | Attribute | Description |
|---|---|---|
| 1 | Website | Valid ZOZOTOWN product URL; accessible and begins with https |
| 2 | PID (Goods ID) | Unique product identifier from backend; consistent across all variants |
| 3 | Name | Full product name (JP/EN as available), cleaned of special characters |
| 4 | Short Description | Product short description; N/A if unavailable |
| 5 | Description | Complete description, free of HTML/junk; supports JP/EN |
| 6 | Category | Full breadcrumb category path, pipe-delimited |
| 7 | Image URL | Main product image URL; valid http/https, no short links |
| 8 | Price (MRP) | Maximum Retail Price; matched to source; defaults to 0 if unavailable |
| 9 | Price Currency | Always JPY |
| 10 | Sale Price | Selling price; validated as ≤ MRP; defaults to 0 if unavailable |
| 11 | Final Price | Final payable price; equal to sale price or further discounted price |
| 12 | Discount | Numeric difference between MRP and final price; defaults to 0 |
| 13 | IsOnSale | Boolean (TRUE/FALSE) sale status; mandatory |
| 14 | IsInStock | Boolean (TRUE/FALSE) stock status; mandatory |
| 15 | Keywords | Always set as N/A |
| 16 | Brand | Product brand name; matched to source, not blank |
| 17 | Manufacturer | Manufacturer detail, where available |
| 18 | MPN | Manufacturer Part Number from backend where available, else N/A |
| 19 | ASIN | Hash-generated from Goods ID + Color + Size; unique per variant |
| 20 | SKU | Product SKU, validated from backend where available |
| 21 | Color | Product color variant; every available variation captured |
| 22 | Gender | Product gender category (Men, Women, Unisex); N/A if unavailable |
| 23 | Size | Product size variant; every available variation captured |
| 24 | Variant Price | Price for the specific variant; matched to final price |
| 25 | Alternate Image URLs | Secondary images, pipe-delimited, excluding the main image |
| 26 | Link URL | Valid product page URL; verified non-404 |
| 27 | Num Ratings | Total number of ratings; numeric, defaults to 0 |
| 28 | Average Ratings | Average rating, capped at 5; numeric, defaults to 0 |
Each batch underwent a structured QA process before final delivery:
| Validation Check | Rule Applied |
|---|---|
| Mandatory field completeness | Core fields (PID, Name, Category, Price, IsOnSale, IsInStock) never blank |
| URL validation | Website and Link URL verified as accessible and non-404 |
| Pricing consistency | Sale Price ≤ MRP; Discount = MRP minus Final Price |
| Rating validation | Average Ratings validated as numeric and ≤ 5 |
| Variant completeness | All color/size combinations per product confirmed as separate rows |
| ASIN uniqueness | Hash-based ASIN verified unique per Goods ID + Color + Size |
| Text cleanliness | HTML tags and junk removed from Name, Description, Short Description |
| Category path validation | Breadcrumb path verified complete and correctly pipe-delimited |
| Currency standardization | Price Currency validated as JPY across all records |
| Boolean field validation | IsOnSale and IsInStock validated as strict TRUE/FALSE |
| Duplicate prevention | Duplicate variant records removed prior to delivery |
| Null handling compliance | Unavailable attributes represented as N/A per defined standard |
| Metric | Value |
|---|---|
| Platform | ZOZOTOWN |
| Industry | Fashion & Apparel |
| Geography | Japan |
| Category Coverage | All available categories on the platform |
| Variant Coverage | All color and size combinations per product |
| Language | Japanese and English, as on-site (no translation) |
| Variant Identifier | Hash-based ASIN (Goods ID + Color + Size) |
| Output Schema | Fixed 28 columns |
| Output Format | Excel (.xlsx) / CSV (.csv) |
| Frequency | Once-off |
| Timeline | 3 to 4 working days |
"Every color and size as its own row with a stable ID meant real variant-level analysis at last. Bilingual content preserved, zero cleanup — it loaded straight into our tools."
— Lead Catalog Analyst
Actowiz Solutions designs custom, large-scale scraping, extraction, and API-delivery pipelines with rigorous QA. Visit actowizsolutions.com to discuss your data requirement.
Our web scraping expertise is relied on by 4,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.
Watch how businesses like yours are using Actowiz data to drive growth.
From Zomato to Expedia — see why global leaders trust us with their data.
Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.
We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.
Leverage MisterLlantas Tyre Data Scraping to track tyre prices, inventory, brands, specifications, and automotive market trends.
Unlock property market insights with Scraping imot.bg Real Estate Data to track listings, prices, trends, and investment opportunities.
Nykaa Fashion product data extraction enables businesses to track products, prices, inventory, and trends for smarter retail decisions.
Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.