Actowiz Metrics Real-time
logo
analytics dashboard for brands! Try Free Demo
Navratri Mega Sale Price Tracking

About the Client

Location: Chicago, USA

Industry: E-commerce & Product Distribution

Objective: To extract detailed product data for 200 SKUs (initial batch) — including images, descriptions, specifications, and identifiers (UPC/MPN) — from multiple manufacturer and retailer websites.

The client's ultimate goal was to build a comprehensive master catalog for internal listing and marketplace uploads. Upon successful delivery, the project would scale to 2,000+ SKUs across multiple brands and categories.

Project Overview

The client required structured product data suitable for uploading to marketplaces like Amazon, Walmart, Shopify, and eBay. Each SKU needed verified and enriched fields including:

Primary Fields (High Priority)
  • Product Name
  • Full Description
  • High-Resolution Image URLs
Secondary Fields
  • MPN / UPC / EAN
  • Weight & Dimensions
  • Color / Material
  • Technical Specifications
  • Category / Subcategory
  • Price (where public)

Output Format: Excel or CSV

Accuracy Target: ≥ 98% verified completeness

The Challenge

Navratri Mega Sale Price Tracking

Extracting detailed product data across hundreds of SKUs seems simple, but it presents multiple technical and operational challenges:

  • Inconsistent Product Pages: Each manufacturer's layout and tag structure vary, requiring site-specific scrapers.
  • Image Handling: Some websites store multiple image versions or use lazy loading, which complicates direct URL extraction.
  • Incomplete or Hidden Data: Many product pages hide MPN, weight, or specifications behind tabs or JavaScript.
  • Data Quality: Descriptions may include HTML tags, repeated text, or embedded special characters that must be cleaned.
  • Volume & Scalability: While the initial batch was 200 SKUs, the long-term plan involved 2,000+ SKUs, demanding scalable infrastructure.

Goals & Deliverables

Navratri Mega Sale Price Tracking

Actowiz Solutions was tasked to:

  • Scrape product data for 200 SKUs with no missing entries.
  • Deliver the following key fields per SKU:
    • Description (text + HTML cleaned)
    • Primary Image URL(s)
    • MPN/UPC/EAN
    • Weight, Color, Specifications
  • Output the dataset in Excel and CSV.
  • Maintain a data quality report outlining:
    • Total items processed
    • Fields completed per item
    • Missing or inferred values

Technology Stack

Function Tools / Frameworks
Core Scraper Python (Scrapy + Requests + Playwright)
HTML Parsing BeautifulSoup4, lxml
JavaScript Handling Playwright (headless Chromium)
Image Extraction Regex patterning & attribute parsing
Data Cleaning Pandas + Regular Expressions
Validation UPC & MPN regex filters
Output Excel / CSV via Pandas
Logging Python logging + custom retry handler

Scraping Workflow

[ Product URLs / SKU List ]

[ Scrapy Spider → Playwright Renderer (for dynamic sites) ]

[ Extraction Layer ]

→ Product Title

→ Description (clean HTML)

→ Image URLs

→ Specs (Weight, Color, etc.)

[ Validation & Deduplication ]

[ Data Cleaning & Normalization ]

[ Export → Excel / CSV + Quality Report ]

Implementation Highlights

1. Dynamic Rendering with Playwright

Many retailer websites used lazy-loaded content. Actowiz Solutions used Playwright headless browser to render the DOM fully before parsing text and image elements.

2. Smart Description Cleaning

Descriptions were cleaned with regex rules to remove extra line breaks, tags, and irrelevant scripts, while maintaining bullet points and formatting.

3. Multi-Image Extraction

For each SKU, all <img> tags inside the product gallery section were scraped and converted into full URLs.

4. MPN/UPC Validation

Patterns like:

(\b\d{8,14}\b)

were used to detect valid numeric UPCs. Alphanumeric MPNs were standardized to uppercase.

5. Specification Parsing

Product detail tables were mapped into key-value pairs. Example:

Attribute Extracted
Weight 1.5 kg
Dimensions 25x18x9 cm
Color Blue
Material Aluminum
6. Deduplication & Quality Assurance

If multiple sources listed the same SKU, the scraper prioritized official brand/manufacturer data for consistency.

Sample Dataset (Illustrative)

SKU Product Name Description MPN UPC Weight Color Image URL City of Origin
SKU001 Stainless Steel Travel Mug 500ml Insulated 500ml stainless mug with spill-proof lid. TM-500SS 87432948172 0.5 kg Silver https://example.com/mug.jpg Chicago
SKU002 Noise-Canceling Headphones Wireless headphones with 20h battery life. NC-H200 098432874321 0.9 kg Black https://example.com/headphones.jpg Boston
SKU003 Yoga Mat Eco 6mm Non-slip eco mat with carrying strap. YM-ECO6 88213457492 1.2 kg Blue https://example.com/yogamat.jpg Austin
SKU004 Smartwatch Sport 4.0 Waterproof smartwatch with heart rate monitor. SW-4SPORT 987654112345 0.35 kg Red https://example.com/watch.jpg Miami

Data Cleaning Example

Before Cleaning:
<p><b>Features:</b><br>High-quality material<br>Available in red and black<br><script>alert('promo');</script></p>
After Cleaning:

Features: • High-quality material • Available in red and black

Infographic Concept – "How Actowiz Scrapes SKU-Level Product Data"

Navratri Mega Sale Price Tracking

Chart Example – Data Completeness by Field

Field Completion %
Description 100%
Image 100%
MPN/UPC 94%
Weight 87%
Color 91%
Other Specs 85%

Quality Metrics

Metric Result
Total SKUs Scraped 200
Average Fields per SKU 8.5
Verified Images 200 (100%)
Verified Descriptions 200 (100%)
MPN / UPC Captured 94%
Data Accuracy 98.6%
Turnaround Time 4 business days

Challenges Solved

  • Multiple Layouts, One Parser: Unified schema mapping allowed extraction from dozens of page structures.
  • Image Parsing Consistency: Implemented fallback logic for alternate <meta property="og:image"> tags.
  • Data Gaps: Implemented rule-based inference (e.g., color from title text).
  • Scalability: Pipeline prepared for 10× more SKUs without performance issues.

Project Outcome

  • Delivered clean Excel dataset with all 200 products fully enriched.
  • Met 100% of priority field requirements.
  • Validated images and descriptions for eCommerce listing readiness.
  • Developed scraper capable of handling 2,000+ SKUs in future batches.

Client Benefits

  • Faster Catalog Creation: Eliminated manual entry time.
  • Accurate Product Data: Improved buyer confidence on listings.
  • Marketplace Compatibility: CSV ready for Amazon, Shopify, WooCommerce.
  • Scalable Framework: Easily expandable to thousands of SKUs.

Client Testimonial

“Actowiz Solutions provided a complete product dataset, clean and verified. The SKU scraping was accurate, and every image and specification matched perfectly. We plan to expand to 2,000+ items with their support.”

— Product Data Manager, Chicago-based E-commerce Distributor

Ethical & Compliance Practices

  • Collected publicly available product information only.
  • No bypassing of CAPTCHAs or restricted data sources.
  • Followed robots.txt and fair-use scraping policies.
  • Data used solely for internal product catalog creation.

Actowiz Solutions ensures full compliance with international data protection standards and ethical data sourcing norms.

Why Choose Actowiz Solutions

  • Experience with large-scale SKU & eCommerce scraping.
  • Expertise in image extraction, HTML cleaning, and spec mapping.
  • Scalable, reliable Python-based infrastructure.
  • End-to-end delivery with accuracy and documentation.

Future Enhancements

  • Add real-time price monitoring from multiple retailers.
  • Integrate AI-based product classification by category.
  • Enable API-based delivery to sync SKU data automatically.
  • Expand to competitor benchmarking and stock tracking.

Conclusion

This case study demonstrates how Actowiz Solutions helped an eCommerce client automate SKU-level data extraction from multiple product websites, covering descriptions, images, and technical specs with near-perfect accuracy.

By leveraging Python Scrapy + Playwright, the solution delivered verified, ready-to-use product data—reducing manual effort by over 90% and setting the stage for future large-scale catalog updates.

Whether for marketplaces, distributors, or analytics teams, Actowiz Solutions provides the tools and expertise to convert scattered web data into structured, actionable product intelligence.

Social Proof That Converts

Trusted by Global Leaders Across Q-Commerce, Travel, Retail, and FoodTech

Our web scraping expertise is relied on by 3,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.

3,000+ Enterprises Worldwide
50+ Countries Served
20+ Industries
Join 3,000+ companies growing with Actowiz →
Real Results from Real Clients

Hear It Directly from Our Clients

Watch how businesses like yours are using Actowiz data to drive growth.

1 min
★★★★★
"Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing!"
FC
Febbin Chacko
Small Business Owner
Fin
2 min
★★★★★
"Actowiz delivered impeccable results for our company. Their team ensured data accuracy and on-time delivery. The competitive intelligence completely transformed our pricing strategy."
JI
Javier Ibanez
Head of Analytics
atacy.es
1:30
★★★★★
"What impressed me most was the speed — we went from requirement to production data in under 48 hours. The API integration was seamless and the support team is always responsive."
RK
Rajesh Kumar
CTO
QComm Brand
4.8/5 Average Rating
📹 50+ Video Testimonials
🔄 92% Client Retention
🌍 50+ Countries Served

Join 3,000+ Companies Growing with Actowiz

From Zomato to Expedia — see why global leaders trust us with their data.

Why Global Leaders Trust Actowiz

Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.

icons
7+
Years of Experience
Proven track record delivering enterprise-grade web scraping and data intelligence solutions.
icons
4,000+
Projects Delivered
Serving startups to Fortune 500 companies across 50+ countries worldwide.
icons
200+
In-House Experts
Dedicated engineers across scrapers, AI/ML models, APIs, and data quality assurance.
icons
9.2M
Automated Workflows
Running weekly across eCommerce, Quick Commerce, Travel, Real Estate, and Food industries.
icons
270+ TB
Data Transferred
Real-time and batch data scraping at massive scale, across industries globally.
icons
380M+
Pages Crawled Weekly
Scaled infrastructure for comprehensive global data coverage with 99% accuracy.

AI Solutions Engineered
for Your Needs

LLM-Powered Attribute Extraction: High-precision product matching using large language models for accurate data classification.
Advanced Computer Vision: Fine-grained object detection for precise product classification using text and image embeddings.
GPT-Based Analytics Layer: Natural language query-based reporting and visualization for business intelligence.
Human-in-the-Loop AI: Continuous feedback loop to improve AI model accuracy over time.
🎯 Product Matching 🏷️ Attribute Tagging 📝 Content Optimization 💬 Sentiment Analysis 📊 Prompt-Based Reporting

Connect the Dots Across
Your Retail Ecosystem

We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.

icons
Analytics Services
icons
Ad Tech
icons
Price Optimization
icons
Business Consulting
icons
System Integration
icons
Market Research
Become a Partner →

Popular Datasets — Ready to Download

Browse All Datasets →
icons
Amazon
eCommerce
Free 100 rows
icons
Zillow
Real Estate
Free 100 rows
icons
DoorDash
Food Delivery
Free 100 rows
icons
Walmart
Retail
Free 100 rows
icons
Booking.com
Travel
Free 100 rows
icons
Indeed
Jobs
Free 100 rows

Latest Insights & Resources

View All Resources →
thumb
Blog

How IHG Hotels & Resorts Data Scraping Helps Overcome Real-Time Availability and Rate Monitoring Issues

How IHG Hotels & Resorts data scraping enables real-time rate tracking, improves availability monitoring, and boosts revenue decisions.

thumb
Case Study

UK Grocery Chain Achieves 300% ROI on Promotional Campaigns

How a top-10 UK grocery retailer used Actowiz grocery price scraping to achieve 300% promotional ROI and reduce competitive response time from 5 days to same-day.

thumb
Report

Track UK Grocery Products Daily Using Automated Data Scraping to Monitor 50,000+ UK Grocery Products from Morrisons, Asda, Tesco, Sainsbury’s, Iceland, Co-op, Waitrose, Ocado

Track UK Grocery Products Daily Using Automated Data Scraping across Morrisons, Asda, Tesco, Sainsbury’s, Iceland, Co-op, Waitrose, and Ocado for insights.

Start Where It Makes Sense for You

Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.

icons
Enterprise
Book a Strategy Call
Custom solutions, dedicated support, volume pricing for large-scale needs.
icons
Growing Brand
Get Free Sample Data
Try before you buy — 500 rows of real data, delivered in 2 hours. No strings.
icons
Just Exploring
View Plans & Pricing
Transparent plans from $500/mo. Find the right fit for your budget and scale.
GeoIp2\Model\City Object
(
    [raw:protected] => Array
        (
            [city] => Array
                (
                    [geoname_id] => 4509177
                    [names] => Array
                        (
                            [de] => Columbus
                            [en] => Columbus
                            [es] => Columbus
                            [fr] => Columbus
                            [ja] => コロンバス
                            [pt-BR] => Columbus
                            [ru] => Колумбус
                            [zh-CN] => 哥伦布
                        )

                )

            [continent] => Array
                (
                    [code] => NA
                    [geoname_id] => 6255149
                    [names] => Array
                        (
                            [de] => Nordamerika
                            [en] => North America
                            [es] => Norteamérica
                            [fr] => Amérique du Nord
                            [ja] => 北アメリカ
                            [pt-BR] => América do Norte
                            [ru] => Северная Америка
                            [zh-CN] => 北美洲
                        )

                )

            [country] => Array
                (
                    [geoname_id] => 6252001
                    [iso_code] => US
                    [names] => Array
                        (
                            [de] => USA
                            [en] => United States
                            [es] => Estados Unidos
                            [fr] => États Unis
                            [ja] => アメリカ
                            [pt-BR] => EUA
                            [ru] => США
                            [zh-CN] => 美国
                        )

                )

            [location] => Array
                (
                    [accuracy_radius] => 20
                    [latitude] => 39.9625
                    [longitude] => -83.0061
                    [metro_code] => 535
                    [time_zone] => America/New_York
                )

            [postal] => Array
                (
                    [code] => 43215
                )

            [registered_country] => Array
                (
                    [geoname_id] => 6252001
                    [iso_code] => US
                    [names] => Array
                        (
                            [de] => USA
                            [en] => United States
                            [es] => Estados Unidos
                            [fr] => États Unis
                            [ja] => アメリカ
                            [pt-BR] => EUA
                            [ru] => США
                            [zh-CN] => 美国
                        )

                )

            [subdivisions] => Array
                (
                    [0] => Array
                        (
                            [geoname_id] => 5165418
                            [iso_code] => OH
                            [names] => Array
                                (
                                    [de] => Ohio
                                    [en] => Ohio
                                    [es] => Ohio
                                    [fr] => Ohio
                                    [ja] => オハイオ州
                                    [pt-BR] => Ohio
                                    [ru] => Огайо
                                    [zh-CN] => 俄亥俄州
                                )

                        )

                )

            [traits] => Array
                (
                    [ip_address] => 216.73.216.153
                    [prefix_len] => 22
                )

        )

    [continent:protected] => GeoIp2\Record\Continent Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [code] => NA
                    [geoname_id] => 6255149
                    [names] => Array
                        (
                            [de] => Nordamerika
                            [en] => North America
                            [es] => Norteamérica
                            [fr] => Amérique du Nord
                            [ja] => 北アメリカ
                            [pt-BR] => América do Norte
                            [ru] => Северная Америка
                            [zh-CN] => 北美洲
                        )

                )

            [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                (
                    [0] => en
                )

            [validAttributes:protected] => Array
                (
                    [0] => code
                    [1] => geonameId
                    [2] => names
                )

        )

    [country:protected] => GeoIp2\Record\Country Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [geoname_id] => 6252001
                    [iso_code] => US
                    [names] => Array
                        (
                            [de] => USA
                            [en] => United States
                            [es] => Estados Unidos
                            [fr] => États Unis
                            [ja] => アメリカ
                            [pt-BR] => EUA
                            [ru] => США
                            [zh-CN] => 美国
                        )

                )

            [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                (
                    [0] => en
                )

            [validAttributes:protected] => Array
                (
                    [0] => confidence
                    [1] => geonameId
                    [2] => isInEuropeanUnion
                    [3] => isoCode
                    [4] => names
                )

        )

    [locales:protected] => Array
        (
            [0] => en
        )

    [maxmind:protected] => GeoIp2\Record\MaxMind Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                )

            [validAttributes:protected] => Array
                (
                    [0] => queriesRemaining
                )

        )

    [registeredCountry:protected] => GeoIp2\Record\Country Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [geoname_id] => 6252001
                    [iso_code] => US
                    [names] => Array
                        (
                            [de] => USA
                            [en] => United States
                            [es] => Estados Unidos
                            [fr] => États Unis
                            [ja] => アメリカ
                            [pt-BR] => EUA
                            [ru] => США
                            [zh-CN] => 美国
                        )

                )

            [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                (
                    [0] => en
                )

            [validAttributes:protected] => Array
                (
                    [0] => confidence
                    [1] => geonameId
                    [2] => isInEuropeanUnion
                    [3] => isoCode
                    [4] => names
                )

        )

    [representedCountry:protected] => GeoIp2\Record\RepresentedCountry Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                )

            [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                (
                    [0] => en
                )

            [validAttributes:protected] => Array
                (
                    [0] => confidence
                    [1] => geonameId
                    [2] => isInEuropeanUnion
                    [3] => isoCode
                    [4] => names
                    [5] => type
                )

        )

    [traits:protected] => GeoIp2\Record\Traits Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [ip_address] => 216.73.216.153
                    [prefix_len] => 22
                    [network] => 216.73.216.0/22
                )

            [validAttributes:protected] => Array
                (
                    [0] => autonomousSystemNumber
                    [1] => autonomousSystemOrganization
                    [2] => connectionType
                    [3] => domain
                    [4] => ipAddress
                    [5] => isAnonymous
                    [6] => isAnonymousProxy
                    [7] => isAnonymousVpn
                    [8] => isHostingProvider
                    [9] => isLegitimateProxy
                    [10] => isp
                    [11] => isPublicProxy
                    [12] => isResidentialProxy
                    [13] => isSatelliteProvider
                    [14] => isTorExitNode
                    [15] => mobileCountryCode
                    [16] => mobileNetworkCode
                    [17] => network
                    [18] => organization
                    [19] => staticIpScore
                    [20] => userCount
                    [21] => userType
                )

        )

    [city:protected] => GeoIp2\Record\City Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [geoname_id] => 4509177
                    [names] => Array
                        (
                            [de] => Columbus
                            [en] => Columbus
                            [es] => Columbus
                            [fr] => Columbus
                            [ja] => コロンバス
                            [pt-BR] => Columbus
                            [ru] => Колумбус
                            [zh-CN] => 哥伦布
                        )

                )

            [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                (
                    [0] => en
                )

            [validAttributes:protected] => Array
                (
                    [0] => confidence
                    [1] => geonameId
                    [2] => names
                )

        )

    [location:protected] => GeoIp2\Record\Location Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [accuracy_radius] => 20
                    [latitude] => 39.9625
                    [longitude] => -83.0061
                    [metro_code] => 535
                    [time_zone] => America/New_York
                )

            [validAttributes:protected] => Array
                (
                    [0] => averageIncome
                    [1] => accuracyRadius
                    [2] => latitude
                    [3] => longitude
                    [4] => metroCode
                    [5] => populationDensity
                    [6] => postalCode
                    [7] => postalConfidence
                    [8] => timeZone
                )

        )

    [postal:protected] => GeoIp2\Record\Postal Object
        (
            [record:GeoIp2\Record\AbstractRecord:private] => Array
                (
                    [code] => 43215
                )

            [validAttributes:protected] => Array
                (
                    [0] => code
                    [1] => confidence
                )

        )

    [subdivisions:protected] => Array
        (
            [0] => GeoIp2\Record\Subdivision Object
                (
                    [record:GeoIp2\Record\AbstractRecord:private] => Array
                        (
                            [geoname_id] => 5165418
                            [iso_code] => OH
                            [names] => Array
                                (
                                    [de] => Ohio
                                    [en] => Ohio
                                    [es] => Ohio
                                    [fr] => Ohio
                                    [ja] => オハイオ州
                                    [pt-BR] => Ohio
                                    [ru] => Огайо
                                    [zh-CN] => 俄亥俄州
                                )

                        )

                    [locales:GeoIp2\Record\AbstractPlaceRecord:private] => Array
                        (
                            [0] => en
                        )

                    [validAttributes:protected] => Array
                        (
                            [0] => confidence
                            [1] => geonameId
                            [2] => isoCode
                            [3] => names
                        )

                )

        )

)
 country : United States
 city : Columbus
US
Array
(
    [as_domain] => amazon.com
    [as_name] => Amazon.com, Inc.
    [asn] => AS16509
    [continent] => North America
    [continent_code] => NA
    [country] => United States
    [country_code] => US
)

Request Free Sample Data

Our team will reach out within 2 hours with 500 rows of real data — no credit card required.

+1
Free 500-row sample · No credit card · Response within 2 hours