SKU mapping is one of the hardest problems in retail analytics. Electronics especially — because the same product appears in dozens of formats:
A single smartphone like Samsung Galaxy A34 5G can appear in 35–50 variations across:
If you're mapping 20,000 SKUs, manual processing is impossible.
This tutorial shows you exactly how to:
This is the same workflow Actowiz Solutions uses for large-scale retail analytics projects for GCC electronics retailers.
Install all external libraries:
pip install selenium
pip install pandas
pip install fuzzywuzzy
pip install python-Levenshtein
pip install requests
pip install beautifulsoup4
You will use:
Basic imports:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import pandas as pd
from time import sleep
import re
from fuzzywuzzy import fuzz
import json
Open product category:
browser = webdriver.Chrome()
browser.get("https://www.amazon.ae/s?k=smartphones")
sleep(3)
Scroll for pagination:
for _ in range(8):
browser.find_element(By.TAG_NAME, "body").send_keys(Keys.END)
sleep(2)
items = browser.find_elements(By.XPATH, '//div[@data-component-type="s-search-result"]')
records = []
for item in items:
try:
title = item.find_element(By.TAG_NAME, "h2").text
except:
title = ""
try:
price = item.find_element(By.CLASS_NAME, "a-price-whole").text
except:
price = ""
try:
link = item.find_element(By.TAG_NAME, "a").get_attribute("href")
except:
link = ""
records.append({
"title": title,
"price": price,
"url": link
})
Export the first batch:
df = pd.DataFrame(records)
df.head()
This becomes your raw SKU dataset.
Electronics SKUs always contain model codes like:
Use regex:
def extract_model(title):
pattern = r"([A-Za-z0-9-]{4,})"
matches = re.findall(pattern, title)
return matches
df["model_codes"] = df["title"].apply(extract_model)
Example result:
| Title | Extracted Model Code |
|---|---|
| Samsung A54 5G SM-A546E | SM-A546E |
| iPhone 13 A2633 (128GB) | A2633 |
| OnePlus Nord CE CPH2551 | CPH2551 |
This is the most important step in SKU mapping.
Remove noise words:
def clean_title(t):
t = t.lower()
remove = ["official", "2025 version", "brand new", "uae version", "global version"]
for r in remove:
t = t.replace(r, "")
return t.strip()
df["clean_title"] = df["title"].apply(clean_title)
This ensures consistency across platforms.
For SKU mapping, we need to identify when:
FuzzyWuzzy helps:
def is_match(t1, t2, threshold=80):
score = fuzz.token_set_ratio(t1, t2)
return score >= threshold
Example:
is_match("samsung galaxy a54 5g", "galaxy a54 samsung phone 5g")
# Output: True
Here is a simple grouping logic:
sku_groups = {}
for i, row in df.iterrows():
model = tuple(row["model_codes"])
if model not in sku_groups:
sku_groups[model] = []
sku_groups[model].append({
"title": row["title"],
"price": row["price"],
"url": row["url"],
"clean_title": row["clean_title"]
})
This ensures all SKUs of the same model gather into one family.
with open("sku_mapping_output.json", "w") as f:
json.dump(sku_groups, f, indent=4)
Final structure looks like:
{
"('SM-A546E',)": [
{
"title": "Samsung Galaxy A54 5G 8GB 256GB SM-A546E",
"price": "1199",
"url": "https://www.amazon.ae/..."
},
{
"title": "Samsung A54 5G Dual SIM 256GB",
"price": "1150",
"url": "https://www.noon.com/..."
}
]
}
This is SKU mapping for one product family.
Repeat until you process 20,000 SKUs.
These are crucial for retailers.
This tutorial script works, but large-scale SKU mapping faces issues:
Requires proxies, residential IPs, and anti-bot solutions.
Requires smarter regex + NLP-based extraction.
Local machine will not handle this scale.
Requires vector-based similarity using sentence transformers.
Slight errors can mis-map SKUs.
If your project involves:
…then manual coding is not enough.
Actowiz Solutions provides:
Electronics, fashion, grocery — Actowiz maps SKUs across all industries.
Mapping 20,000 SKUs across multiple electronics retailers is a complex problem involving:
The tutorial above gives you the complete technical pipeline to build your own SKU mapping solution.
But if you want an automated, scalable, production-ready system for UAE retail — Actowiz Solutions can deploy a fully managed SKU mapping framework tailored to your needs.
You can also reach us for all your mobile app scraping, data collection, web scraping , and instant data scraper service requirements!
Our web scraping expertise is relied on by 4,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.
Watch how businesses like yours are using Actowiz data to drive growth.
From Zomato to Expedia — see why global leaders trust us with their data.
Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.
We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.
Extract real-time travel mode data via APIs to power smarter AI travel apps with live route updates, transit insights, and seamless trip planning.
How a $50M+ consumer electronics brand used Actowiz MAP monitoring to detect 800+ violations in 30 days, achieving 92% resolution rate and improving retailer satisfaction by 40%.

Track UK Grocery Products Daily Using Automated Data Scraping across Morrisons, Asda, Tesco, Sainsbury’s, Iceland, Co-op, Waitrose, and Ocado for insights.
Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.