NEW 2026

GCC Quick Commerce

Talabat · Careem Quik · Noon Minutes — live pricing across Dubai, Riyadh, Abu Dhabi & Jeddah. 18 GCC cities.

Launch Demo →
HOT

KitchenIntel

Cloud kitchen market gaps, ghost-kitchen tracking & strategy simulator. Plans from ₹9,999/mo.

See Pricing →

UK Grocery Price Tracker

Tesco · Sainsbury's · Asda · Morrisons · Aldi — daily price comparison across all major UK grocers.

Get Early Access →
11+Dashboards
99.9%Accuracy
Want THIS view for your brand · your city · your category? Custom dashboard in 7 days. Free Consultation →

A fashion-retail analytics client needing a complete, variant-level product catalog from ZOZOTOWN — one of Japan's largest online fashion platforms — for assortment and pricing benchmarking.

Industry
Fashion & Apparel
Region
Japan
Duration
3–4 Working Days
28
Fields per Variant
Variant-Level
Color & Size Coverage
3–4 Days
Delivery Timeline
350,000+
Products Extracted

Client Overview

The client required a complete, analysis-ready fashion catalog dataset from ZOZOTOWN — one of Japan's largest online fashion retail platforms — to support catalog analysis, assortment evaluation, and competitive benchmarking in the Japanese fashion retail sector.

Actowiz Solutions executed a once-off, full-catalog extraction across all available categories, capturing not just core product details but every available color and size variation for each item. Two structural requirements were central: complete variant-level coverage — where every color/size combination is captured as its own structured record — and a unique hash-based ASIN generated from Goods ID, color, and size to reliably identify each variant downstream.

Data was captured in both Japanese and English exactly as displayed on the platform, with no translation applied, preserving source-language fidelity. The final dataset was delivered in Excel/CSV — clean, deduplicated, and structured for immediate analytical use.

The Challenge

  • Variant-level complexity. Fashion products commonly carry multiple color and size combinations per listing. Capturing every combination as a distinct, correctly mapped record — rather than collapsing variants into a single row — significantly increased extraction complexity.
  • Reliable variant identification. The platform's native Goods ID (used as PID) identifies a product but not a specific variant. A consistent method was needed to uniquely identify each color/size combination without relying on platform-native variant IDs alone.
  • Bilingual content without translation. Product names and descriptions had to be captured in whichever language the platform displayed (Japanese or English), with strict instructions not to apply any translation layer that could alter source meaning.
  • Pricing logic consistency. MRP, sale price, final price, and discount needed strict logical consistency (e.g., sale price must not exceed MRP), requiring careful handling of the platform's pricing display across regular and discounted listings.
  • Text quality. Descriptions and names had to be free of HTML tags and junk characters, requiring a dedicated text-cleaning layer applied consistently across all bilingual content.

The Solution by Actowiz Solutions

Actowiz designed a full-catalog extraction pipeline built around three core capabilities — complete category traversal, row-wise variant expansion, and hash-based variant identification — delivered as a clean, structured Excel/CSV dataset.

Variant-Level Extraction

For every product page, the pipeline captured all available color and size combinations and represented each as a separate, fully-attributed row. This row-wise expansion enables analysis at the true SKU/variant level rather than an aggregated product level — essential for accurate assortment and pricing analysis in fashion. The Goods ID (used as PID) stays consistent across all variant rows of the same parent product, so variants can be grouped back to their parent whenever needed.

Hash-Based ASIN Generation

To uniquely identify each variant, a hash-based ASIN is generated by combining the product's Goods ID with its specific Color and Size values. This produces a deterministic, unique identifier for every color/size combination — the same variant always resolves to the same ASIN — supporting consistent variant tracking without depending on any platform-native variant ID.

Bilingual Content Capture

Names, descriptions, and other text were extracted exactly as displayed — Japanese, English, or a mix — depending on what the platform presented for each product. No translation layer was applied at any stage, preserving full fidelity while still passing through the text-cleaning layer that removes HTML tags and junk characters.

Pricing Logic & Standardization

MRP, sale price, and final price were cross-checked for logical consistency — sale price validated as ≤ MRP, with discount calculated as the numeric difference between MRP and final price. All prices were standardized to Japanese Yen (JPY), and boolean fields such as IsOnSale and IsInStock normalized to consistent TRUE/FALSE values.

Output Data Attributes (28-Column Schema)

The final Excel/CSV dataset contains standardized attributes per variant row, covering product identity, descriptive content, category hierarchy, pricing, variant detail, and rating information.

# Attribute Description
1 Website Valid ZOZOTOWN product URL; accessible and begins with https
2 PID (Goods ID) Unique product identifier from backend; consistent across all variants
3 Name Full product name (JP/EN as available), cleaned of special characters
4 Short Description Product short description; N/A if unavailable
5 Description Complete description, free of HTML/junk; supports JP/EN
6 Category Full breadcrumb category path, pipe-delimited
7 Image URL Main product image URL; valid http/https, no short links
8 Price (MRP) Maximum Retail Price; matched to source; defaults to 0 if unavailable
9 Price Currency Always JPY
10 Sale Price Selling price; validated as ≤ MRP; defaults to 0 if unavailable
11 Final Price Final payable price; equal to sale price or further discounted price
12 Discount Numeric difference between MRP and final price; defaults to 0
13 IsOnSale Boolean (TRUE/FALSE) sale status; mandatory
14 IsInStock Boolean (TRUE/FALSE) stock status; mandatory
15 Keywords Always set as N/A
16 Brand Product brand name; matched to source, not blank
17 Manufacturer Manufacturer detail, where available
18 MPN Manufacturer Part Number from backend where available, else N/A
19 ASIN Hash-generated from Goods ID + Color + Size; unique per variant
20 SKU Product SKU, validated from backend where available
21 Color Product color variant; every available variation captured
22 Gender Product gender category (Men, Women, Unisex); N/A if unavailable
23 Size Product size variant; every available variation captured
24 Variant Price Price for the specific variant; matched to final price
25 Alternate Image URLs Secondary images, pipe-delimited, excluding the main image
26 Link URL Valid product page URL; verified non-404
27 Num Ratings Total number of ratings; numeric, defaults to 0
28 Average Ratings Average rating, capped at 5; numeric, defaults to 0

Technical Approach

  • Category traversal engine. Systematic navigation across all accessible ZOZOTOWN categories, supplemented by manual coverage verification due to the absence of a reliable sitemap.
  • PDP parsing. Structured extraction of all core attributes from each product detail page — description, pricing, brand, and image data.
  • Variant expansion logic. Automated detection and row-wise expansion of all color/size combinations, so every variant is its own complete record.
  • Hash-based ASIN generation. Deterministic hashing of Goods ID + Color + Size to produce a stable, unique variant identifier.
  • Bilingual text capture. Language-agnostic extraction capturing whatever language ZOZOTOWN displays per field, with no translation logic applied.
  • Text-cleaning pipeline. HTML tag stripping and junk-character removal applied uniformly across all descriptive fields, regardless of source language.
  • Pricing validation logic. Automated cross-checks ensuring Sale Price ≤ MRP and Rating ≤ 5, with calculated Discount and standardized JPY tagging.
  • Image URL handling. Main and alternate images extracted separately; alternates concatenated with a pipe delimiter.
  • Deduplication & schema enforcement. Final dataset deduplicated and validated against the fixed 28-column schema before delivery.

Quality Assurance

Each batch underwent a structured QA process before final delivery:

Validation Check Rule Applied
Mandatory field completeness Core fields (PID, Name, Category, Price, IsOnSale, IsInStock) never blank
URL validation Website and Link URL verified as accessible and non-404
Pricing consistency Sale Price ≤ MRP; Discount = MRP minus Final Price
Rating validation Average Ratings validated as numeric and ≤ 5
Variant completeness All color/size combinations per product confirmed as separate rows
ASIN uniqueness Hash-based ASIN verified unique per Goods ID + Color + Size
Text cleanliness HTML tags and junk removed from Name, Description, Short Description
Category path validation Breadcrumb path verified complete and correctly pipe-delimited
Currency standardization Price Currency validated as JPY across all records
Boolean field validation IsOnSale and IsInStock validated as strict TRUE/FALSE
Duplicate prevention Duplicate variant records removed prior to delivery
Null handling compliance Unavailable attributes represented as N/A per defined standard

Results & Business Impact

  • Complete catalog visibility. Full category coverage gave clear visibility into the breadth of ZOZOTOWN's assortment, supporting catalog benchmarking; breadcrumb paths enabled category-level rollups at any hierarchy level.
  • True variant-level analysis. Row-wise color/size expansion enabled accurate variant-level assortment analysis — size-curve and color-mix analysis across the catalog — with hash-based ASINs providing a stable, reusable variant identifier.
  • Pricing & promotion intelligence. Consistent MRP, sale, final, and discount fields gave clear visibility into promotional activity and price positioning, standardized to JPY.
  • Multilingual data fidelity. Capturing data in original Japanese or English without translation preserved source fidelity for any downstream language-specific processing.
  • Analysis-ready delivery. Clean, deduplicated, schema-consistent Excel/CSV required no further transformation, loading directly into analytical tools within the 3–4 working-day window.

Project at a Glance

Metric Value
Platform ZOZOTOWN
Industry Fashion & Apparel
Geography Japan
Category Coverage All available categories on the platform
Variant Coverage All color and size combinations per product
Language Japanese and English, as on-site (no translation)
Variant Identifier Hash-based ASIN (Goods ID + Color + Size)
Output Schema Fixed 28 columns
Output Format Excel (.xlsx) / CSV (.csv)
Frequency Once-off
Timeline 3 to 4 working days

Client Feedback

"Every color and size as its own row with a stable ID meant real variant-level analysis at last. Bilingual content preserved, zero cleanup — it loaded straight into our tools."

— Lead Catalog Analyst

Need a custom data pipeline for your platform?

Actowiz Solutions designs custom, large-scale scraping, extraction, and API-delivery pipelines with rigorous QA. Visit actowizsolutions.com to discuss your data requirement.

Social Proof That Converts

Trusted by Global Leaders Across Q-Commerce, Travel, Retail, and FoodTech

Our web scraping expertise is relied on by 4,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.

4,000+ Enterprises Worldwide
50+ Countries Served
20+ Industries
Join 4,000+ companies growing with Actowiz →
Real Results from Real Clients

Hear It Directly from Our Clients

Watch how businesses like yours are using Actowiz data to drive growth.

1 min
★★★★★
"Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing!"
TG
Thomas Galido
Co-Founder / Head of Product at Upright Data Inc.
2 min
★★★★★
"Actowiz delivered impeccable results for our company. Their team ensured data accuracy and on-time delivery. The competitive intelligence completely transformed our pricing strategy."
II
Iulen Ibanez
CEO / Datacy.es
1:30
★★★★★
"What impressed me most was the speed — we went from requirement to production data in under 48 hours. The API integration was seamless and the support team is always responsive."
FC
Febbin Chacko
-Fin, Small Business Owner
icons 4.8/5 Average Rating
icons 50+ Video Testimonials
icons 92% Client Retention
icons 50+ Countries Served

Join 4,000+ Companies Growing with Actowiz

From Zomato to Expedia — see why global leaders trust us with their data.

Why Global Leaders Trust Actowiz

Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.

icons
7+
Years of Experience
Proven track record delivering enterprise-grade web scraping and data intelligence solutions.
icons
4,000+
Projects Delivered
Serving startups to Fortune 500 companies across 50+ countries worldwide.
icons
200+
In-House Experts
Dedicated engineers across scrapers, AI/ML models, APIs, and data quality assurance.
icons
9.2M
Automated Workflows
Running weekly across eCommerce, Quick Commerce, Travel, Real Estate, and Food industries.
icons
270+ TB
Data Transferred
Real-time and batch data scraping at massive scale, across industries globally.
icons
380M+
Pages Crawled Weekly
Scaled infrastructure for comprehensive global data coverage with 99% accuracy.

AI Solutions Engineered
for Your Needs

LLM-Powered Attribute Extraction: High-precision product matching using large language models for accurate data classification.
Advanced Computer Vision: Fine-grained object detection for precise product classification using text and image embeddings.
GPT-Based Analytics Layer: Natural language query-based reporting and visualization for business intelligence.
Human-in-the-Loop AI: Continuous feedback loop to improve AI model accuracy over time.
icons Product Matching icons Attribute Tagging icons Content Optimization icons Sentiment Analysis icons Prompt-Based Reporting

Connect the Dots Across
Your Retail Ecosystem

We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.

icons
Analytics Services
icons
Ad Tech
icons
Price Optimization
icons
Business Consulting
icons
System Integration
icons
Market Research
Become a Partner →

Popular Datasets — Ready to Download

Browse All Datasets →
icons
Amazon
eCommerce
Free 100 rows
icons
Zillow
Real Estate
Free 100 rows
icons
DoorDash
Food Delivery
Free 100 rows
icons
Walmart
Retail
Free 100 rows
icons
Booking.com
Travel
Free 100 rows
icons
Indeed
Jobs
Free 100 rows

Latest Insights & Resources

View All Resources →
thumb
Blog

MisterLlantas Tyre Data Scraping for Tyre Prices, Rim Data, and Automotive Market Insights

Leverage MisterLlantas Tyre Data Scraping to track tyre prices, inventory, brands, specifications, and automotive market trends.

thumb
Case Study

How Scraping imot.bg Real Estate Data Helped a Property Analytics Firm Improve Market Intelligence

Unlock property market insights with Scraping imot.bg Real Estate Data to track listings, prices, trends, and investment opportunities.

thumb
Report

Nykaa Fashion Product Data Extraction - Fashion Trends, Pricing Intelligence, And Consumer Buying Behavior

Nykaa Fashion product data extraction enables businesses to track products, prices, inventory, and trends for smarter retail decisions.

Start Where It Makes Sense for You

Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.

icons
Enterprise
Book a Strategy Call
Custom solutions, dedicated support, volume pricing for large-scale needs.
icons
Growing Brand
Get Free Sample Data
Try before you buy — 500 rows of real data, delivered in 2 hours. No strings.
icons
Just Exploring
View Plans & Pricing
Transparent plans from $500/mo. Find the right fit for your budget and scale.

Request Free Sample Data

Our team will reach out within 2 hours with 500 rows of real data — no credit card required.

+1
Free 500-row sample · No credit card · Response within 2 hours