Real-Time Instashop Grocery Price Monitoring API Egypt

Introduction

India’s real estate ecosystem has become increasingly dependent on RERA (Real Estate Regulatory Authority) data. Every state maintains a separate RERA portal with:

project registrations
builder profiles
completion updates
approvals / certificates
litigation & complaint reports
status updates (ongoing / completed / withdrawn)
land details & documentation
project financial disclosures

Brands, investors, lending institutions, property portals, valuation firms, and consultants all depend on this data.

But PAN India RERA data scraping is extremely challenging because:

Every state has a different website
Different HTML layouts
Different pagination & filters
Many portals use CAPTCHA
PDFs must be scraped, parsed, and indexed
Builder names vary heavily
Project names use inconsistent formats

This tutorial shows how to build a RERA Data Aggregation Engine using:

Selenium
Requests
BeautifulSoup
PDF Extractors
NLP-based normalization
PAN India merging logic

This is the same technical framework Actowiz Solutions deploys for clients across real estate intelligence.

Step 1: Install Dependencies

pip install selenium
pip install beautifulsoup4
pip install pandas
pip install requests
pip install pypdf
pip install tabula-py
pip install fuzzywuzzy
pip install python-Levenshtein

Step 2: Understand the RERA Portal Structure

Each state has its own portal:

State	Portal
Maharashtra	https://maharera.mahaonline.gov.in
Gujarat	https://gujrera.gujarat.gov.in
Karnataka	https://rera.karnataka.gov.in
Delhi	https://rera.delhi.gov.in
Tamil Nadu	https://www.tnrera.in
Rajasthan	https://rera.rajasthan.gov.in
Telangana	https://rera.telangana.gov.in

Common data elements:

Project Name
Promoter Name
RERA Registration ID
District / City
Project Type (Residential / Commercial / Mixed)
Project Status
Start & End Date
Approved Plans
Quarterly Updates (QPR)
Financial disclosures
Litigation / complaints
Uploaded documents (PDFs)

Step 3: Launch Selenium and Open a RERA State Portal (Example: Maharashtra)

from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep

browser = webdriver.Chrome()
browser.get("https://maharera.mahaonline.gov.in/SearchList/Search")
sleep(3)

Step 4: Apply Filters (e.g., Project Name or District)

Example: Select “Mumbai Suburban”

district_dropdown = browser.find_element(By.ID, "DistrictID")
district_dropdown.click()

option = browser.find_element(By.XPATH, '//option[contains(text(),"Mumbai Suburban")]')
option.click()

search_btn = browser.find_element(By.ID, "btnSearch")
search_btn.click()
sleep(4)

Step 5: Extract All Rows From the Results Table

rows = browser.find_elements(By.XPATH, '//table[@id="tblProjects"]/tbody/tr')

rera_records = []

for row in rows:
    cols = row.find_elements(By.TAG_NAME, "td")
    if len(cols) > 5:
        rera_records.append({
            "rera_id": cols[0].text,
            "project_name": cols[1].text,
            "promoter": cols[2].text,
            "district": cols[3].text,
            "type": cols[4].text,
            "status": cols[5].text,
            "details_url": cols[1].find_element(By.TAG_NAME, "a").get_attribute("href")
        })

Step 6: Visit Each Project Page and Extract Details

Each RERA portal shows deeper details inside project page:

import requests
from bs4 import BeautifulSoup

def scrape_project_details(url):
    try:
        r = requests.get(url)
        soup = BeautifulSoup(r.text, "lxml")

        table = soup.find("table", {"class": "table"})
        rows = table.find_all("tr")

        data = {}
        for tr in rows:
            tds = tr.find_all("td")
            if len(tds) == 2:
                data[tds[0].text.strip()] = tds[1].text.strip()
        return data
    except:
        return {}

for entry in rera_records:
    entry["details"] = scrape_project_details(entry["details_url"])

Step 7: Handling PDF Documents (Layouts, Maps, Certificates)

Most RERA portals provide PDF documents:

Approved floor plans
Allotment letters
Legal certificates
Building permissions
Quarterly reports

To extract text:

from pypdf import PdfReader

def extract_pdf_text(pdf_url):
    response = requests.get(pdf_url)

    with open("temp.pdf", "wb") as f:
        f.write(response.content)

    reader = PdfReader("temp.pdf")
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text

Extract all PDFs linked on project page:

def get_pdfs(soup):
    pdf_links = []
    for a in soup.find_all("a", href=True):
        if ".pdf" in a["href"].lower():
            pdf_links.append(a["href"])
    return pdf_links

Step 8: Normalize Project Names & Builder Names

Every state writes names differently:

“Lodha Casa Rio Gold”
“Casa Rio Gold by Lodha Group”
“Lodha Group - Casa Rio Gold”

Use fuzzy matching:

from fuzzywuzzy import fuzz

def normalize_name(name):
    name = name.lower().strip()
    unwanted = ["pvt ltd", "private limited", "llp", "developers", "group", "builders"]
    for w in unwanted:
        name = name.replace(w, "")
    return name

df["clean_project"] = df["project_name"].apply(normalize_name)
df["clean_promoter"] = df["promoter"].apply(normalize_name)

Step 9: Merge Data From Different RERA States

Once you scrape Maharashtra, Karnataka, Gujarat, etc.:

final_df = pd.concat([mh_df, ka_df, gj_df, dl_df])
final_df.reset_index(drop=True, inplace=True)

Step 10: Build a Unified Structure for Property Intelligence

Your final aggregated dataset should contain:

Field	Description
rera_id	Unique project registration number
project_name	Raw name
project_clean	Normalized name
promoter	Builder / developer name
promoter_clean	Normalized
state	Maharashtra, Gujarat, Delhi…
city/district	Location
start_date	As per RERA
end_date	Completion date
status	Ongoing / Completed
project_type	Residential / Commercial
document_links	List of PDFs
pdf_text	Extracted content
qpr_updates	Quarterly progress reports
litigation	Complaints if any

Step 11: Save the Final Output

final_df.to_csv("PAN_India_RERA_Data.csv", index=False)

Step 12: Visualize Data for Real Estate Intelligence

Use dashboards for:

New projects launched per month
Builder performance
City-wise construction activity
Delay tracking
Litigation frequency
Project type distribution

Limitations of RERA Scraping

CAPTCHA prevents automation in some states

Requires AI-based solving or manual intervention.

Inconsistent HTML

Every state portal is different.

PDFs are inconsistent

Different formats, scanned documents.

Slow portals

Some states have outdated servers.

Rate limits

Frequent scraping may get blocked.

When to Use Actowiz Solutions Instead of DIY?

DIY works for:

small research projects
state-level scraping
a few hundred projects

But use Actowiz Solutions for:

PAN India property intelligence
Daily/weekly updates
Large-scale RERA dashboards
Advanced PDF parsing
Builder-level performance analytics
Market forecasting models
Automated ETL pipelines
API-based RERA data feeds

We support all 28 states & UTs with RERA coverage.

Conclusion

This tutorial taught you how to:

Scrape RERA portals using Selenium
Extract project listings
Parse detailed project pages
Download and extract PDF contents
Normalize builder & project names
Combine PAN India datasets
Generate property intelligence tables

RERA data is one of the richest sources of real estate truth — and with the right pipelines, it becomes a powerful engine for analytics, compliance, and investment intelligence.

You can also reach us for all your mobile app scraping, data collection, web scraping, and instant data scraper service requirements!

Hear It Directly from Our Clients

Watch how businesses like yours are using Actowiz data to drive growth.

▶

1 min

★★★★★

"Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing!"

Thomas Galido

Co-Founder / Head of Product at Upright Data Inc.

▶

2 min

★★★★★

"Actowiz delivered impeccable results for our company. Their team ensured data accuracy and on-time delivery. The competitive intelligence completely transformed our pricing strategy."

Iulen Ibanez

CEO / Datacy.es

▶

1:30

★★★★★

"What impressed me most was the speed — we went from requirement to production data in under 48 hours. The API integration was seamless and the support team is always responsive."

Febbin Chacko

-Fin, Small Business Owner