Weekly E-commerce Price Comparison in Amazon India - Trends & Insights-01

Introduction

Matthew

Published: December 2025

Fashion eCommerce is no longer just about text-based product information. Today, product images define everything — from style, color accuracy, fit expectations, fabric texture, variant visibility, and even brand perception.

But for retailers, analytics teams, and catalog managers, extracting thousands of fashion images and classifying them manually is impossible.

This technical guide walks you through:

how to scrape fashion product images
how to download, process, and store them
how to classify them using AI
how to extract colors, textures, and product types
how Actowiz Solutions handles all of this at enterprise scale

If you're ready to learn, let’s dive in.

If not?

Actowiz Solutions already provides enterprise-grade image extraction + classification pipelines that handle millions of SKUs across global fashion platforms.

Tools You’ll Use in This Tutorial

Install:

pip install selenium
pip install pillow
pip install tensorflow
pip install opencv-python
pip install requests

You'll use:

Selenium → To load fashion websites
Requests → To download images
Pillow/OpenCV → To process images
TensorFlow (MobileNet) → To classify outfits

Step 1: Launch Selenium and Load a Fashion Product Page

Example: scraping H&M product images.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from time import sleep
import requests
from PIL import Image
from io import BytesIO
import tensorflow as tf
import numpy as np
import json

browser = webdriver.Chrome()
browser.get("https://www2.hm.com/en_in/men/shop-by-product/t-shirts.html")
sleep(3)

# Scroll for lazy loading:
for _ in range(6):
    browser.find_element(By.TAG_NAME, "body").send_keys(Keys.END)
    sleep(2)

Step 2: Extract Product Image URLs, Titles & Prices

products = browser.find_elements(By.XPATH, '//article[contains(@class,"product-item")]')

extracted_data = []

for product in products:
    img_tag = product.find_element(By.TAG_NAME, "img")
    img_url = img_tag.get_attribute("src")

    title = product.find_element(By.CLASS_NAME, "item-heading").text
    price = product.find_element(By.CLASS_NAME, "item-price").text

    extracted_data.append({
        "title": title,
        "price": price,
        "image_url": img_url
    })

Step 3: Download Images Locally

def download_image(url):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    return img

sample_image = download_image(extracted_data[0]["image_url"])
sample_image.show()

Step 4: Classify Fashion Images Using MobileNetV2

Load the model:

model = tf.keras.applications.MobileNetV2(weights="imagenet")

Prepare images:

def prepare_image(img):
    img = img.resize((224, 224))
    img_array = tf.keras.applications.mobilenet_v2.preprocess_input(np.array(img))
    return np.expand_dims(img_array, axis=0)

Predict:

preds = model.predict(prepare_image(sample_image))
decoded = tf.keras.applications.mobilenet_v2.decode_predictions(preds, top=3)[0]
print(decoded)

Example output:

[('jersey', 0.82), ('t-shirt', 0.71), ('sweatshirt', 0.41)]

Step 5: Detect Dominant Color (Using K-Means)

import cv2
from sklearn.cluster import KMeans

def get_dominant_color(img, k=3):
    img = np.array(img)
    img = img.reshape((img.shape[0]*img.shape[1], 3))
    kmeans = KMeans(n_clusters=k)
    kmeans.fit(img)
    return kmeans.cluster_centers_[0]  # RGB output

# Convert RGB into nearest human-readable color name.

Step 6: Save Classified Output to JSON

with open("fashion_images.json", "w") as f:
    json.dump(extracted_data, f, indent=4)

Example output:

{
  "title": "Relaxed Fit Cotton Tee",
  "price": "₹799",
  "image_url": "https://image.hm.com/...jpg",
  "predicted_category": "t-shirt",
  "dominant_color": "White"
}

Step 7: Scrape Multiple Images Per Product

Most fashion product pages contain:

front view
back view
model wearing shot
close-up fabric shot

Modify code:

img_tags = product.find_elements(By.TAG_NAME, "img")

all_images = []

for tag in img_tags:
    all_images.append(tag.get_attribute("src"))

# Classify each image in a loop.

Full Combined Code (For Practical Use)

A stitched final code version is available and I can deliver it as a clean Python script on request.

Limitations of This Approach

Same as Grab tutorial style:

1. Fashion website structure changes

XPath updates break the script.

2. High-resolution images

Slow download times.

3. Anti-bot systems

Heavy scraping may trigger blocks.

4. Model accuracy

Pre-trained models may misclassify fashion-specific images.

When to Use Actowiz Solutions Instead of DIY Scripts?

DIY = good for learning. But NOT for production.

If you need:

millions of images
20,000+ SKUs
high-speed crawling
GPU-based classification
variant linking
quality scoring
defect detection
fabric recognition
color consistency checks

Then you need Actowiz Solutions' Image Intelligence Suite, powered by:

scalable crawlers
global IP rotation
AI classification
enterprise APIs
automated pipelines
full catalog mapping

Conclusion

This tutorial showed you how to:

scrape fashion products
download images
classify them using AI
detect colors
organize data in JSON

Modern fashion brands rely on this intelligence to:

understand trends
clean their catalogs
personalize recommendations
improve visual search
standardize product listings
benchmark competitors

And Actowiz Solutions delivers this entire pipeline at enterprise scale.

You can also reach us for all your mobile app scraping, data collection, web scraping , and instant data scraper service requirements!

Hear It Directly from Our Clients

Watch how businesses like yours are using Actowiz data to drive growth.

▶

1 min

★★★★★

"Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing!"

Thomas Galido

Co-Founder / Head of Product at Upright Data Inc.

▶

2 min

★★★★★

"Actowiz delivered impeccable results for our company. Their team ensured data accuracy and on-time delivery. The competitive intelligence completely transformed our pricing strategy."

Iulen Ibanez

CEO / Datacy.es

▶

1:30

★★★★★

"What impressed me most was the speed — we went from requirement to production data in under 48 hours. The API integration was seamless and the support team is always responsive."

Febbin Chacko

-Fin, Small Business Owner