Learn how Actowiz Solutions scraped SKU-level product data including images, descriptions, and specifications to build accurate, ready-to-list eCommerce datasets.
Location: Chicago, USA
Industry: E-commerce & Product Distribution
Objective: To extract detailed product data for 200 SKUs (initial batch) — including images, descriptions, specifications, and identifiers (UPC/MPN) — from multiple manufacturer and retailer websites.
The client's ultimate goal was to build a comprehensive master catalog for internal listing and marketplace uploads. Upon successful delivery, the project would scale to 2,000+ SKUs across multiple brands and categories.
The client required structured product data suitable for uploading to marketplaces like Amazon, Walmart, Shopify, and eBay. Each SKU needed verified and enriched fields including:
Output Format: Excel or CSV
Accuracy Target: ≥ 98% verified completeness
Extracting detailed product data across hundreds of SKUs seems simple, but it presents multiple technical and operational challenges:
Actowiz Solutions was tasked to:
| Function | Tools / Frameworks |
|---|---|
| Core Scraper | Python (Scrapy + Requests + Playwright) |
| HTML Parsing | BeautifulSoup4, lxml |
| JavaScript Handling | Playwright (headless Chromium) |
| Image Extraction | Regex patterning & attribute parsing |
| Data Cleaning | Pandas + Regular Expressions |
| Validation | UPC & MPN regex filters |
| Output | Excel / CSV via Pandas |
| Logging | Python logging + custom retry handler |
[ Product URLs / SKU List ]
↓
[ Scrapy Spider → Playwright Renderer (for dynamic sites) ]
↓
[ Extraction Layer ]
→ Product Title
→ Description (clean HTML)
→ Image URLs
→ Specs (Weight, Color, etc.)
↓
[ Validation & Deduplication ]
↓
[ Data Cleaning & Normalization ]
↓
[ Export → Excel / CSV + Quality Report ]
Many retailer websites used lazy-loaded content. Actowiz Solutions used Playwright headless browser to render the DOM fully before parsing text and image elements.
Descriptions were cleaned with regex rules to remove extra line breaks, tags, and irrelevant scripts, while maintaining bullet points and formatting.
For each SKU, all <img> tags inside the product gallery section were scraped and converted into full URLs.
Patterns like:
(\b\d{8,14}\b)
were used to detect valid numeric UPCs. Alphanumeric MPNs were standardized to uppercase.
Product detail tables were mapped into key-value pairs. Example:
| Attribute | Extracted |
|---|---|
| Weight | 1.5 kg |
| Dimensions | 25x18x9 cm |
| Color | Blue |
| Material | Aluminum |
If multiple sources listed the same SKU, the scraper prioritized official brand/manufacturer data for consistency.
| SKU | Product Name | Description | MPN | UPC | Weight | Color | Image URL | City of Origin |
|---|---|---|---|---|---|---|---|---|
| SKU001 | Stainless Steel Travel Mug 500ml | Insulated 500ml stainless mug with spill-proof lid. | TM-500SS | 87432948172 | 0.5 kg | Silver | https://example.com/mug.jpg | Chicago |
| SKU002 | Noise-Canceling Headphones | Wireless headphones with 20h battery life. | NC-H200 | 098432874321 | 0.9 kg | Black | https://example.com/headphones.jpg | Boston |
| SKU003 | Yoga Mat Eco 6mm | Non-slip eco mat with carrying strap. | YM-ECO6 | 88213457492 | 1.2 kg | Blue | https://example.com/yogamat.jpg | Austin |
| SKU004 | Smartwatch Sport 4.0 | Waterproof smartwatch with heart rate monitor. | SW-4SPORT | 987654112345 | 0.35 kg | Red | https://example.com/watch.jpg | Miami |
<p><b>Features:</b><br>High-quality material<br>Available in red and black<br><script>alert('promo');</script></p>
Features: • High-quality material • Available in red and black
| Field | Completion % |
|---|---|
| Description | 100% |
| Image | 100% |
| MPN/UPC | 94% |
| Weight | 87% |
| Color | 91% |
| Other Specs | 85% |
| Metric | Result |
|---|---|
| Total SKUs Scraped | 200 |
| Average Fields per SKU | 8.5 |
| Verified Images | 200 (100%) |
| Verified Descriptions | 200 (100%) |
| MPN / UPC Captured | 94% |
| Data Accuracy | 98.6% |
| Turnaround Time | 4 business days |
“Actowiz Solutions provided a complete product dataset, clean and verified. The SKU scraping was accurate, and every image and specification matched perfectly. We plan to expand to 2,000+ items with their support.”
— Product Data Manager, Chicago-based E-commerce Distributor
Actowiz Solutions ensures full compliance with international data protection standards and ethical data sourcing norms.
This case study demonstrates how Actowiz Solutions helped an eCommerce client automate SKU-level data extraction from multiple product websites, covering descriptions, images, and technical specs with near-perfect accuracy.
By leveraging Python Scrapy + Playwright, the solution delivered verified, ready-to-use product data—reducing manual effort by over 90% and setting the stage for future large-scale catalog updates.
Whether for marketplaces, distributors, or analytics teams, Actowiz Solutions provides the tools and expertise to convert scattered web data into structured, actionable product intelligence.
Our web scraping expertise is relied on by 3,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.
Watch how businesses like yours are using Actowiz data to drive growth.
From Zomato to Expedia — see why global leaders trust us with their data.
Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.
We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.
How IHG Hotels & Resorts data scraping enables real-time rate tracking, improves availability monitoring, and boosts revenue decisions.
How a top-10 UK grocery retailer used Actowiz grocery price scraping to achieve 300% promotional ROI and reduce competitive response time from 5 days to same-day.

Track UK Grocery Products Daily Using Automated Data Scraping across Morrisons, Asda, Tesco, Sainsbury’s, Iceland, Co-op, Waitrose, and Ocado for insights.
Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.