Actowiz Metrics Real-time
logo
analytics dashboard for brands! Try Free Demo
Exploratory-Data-Analysis-of-Housing-Rental-Market-in-Germany-with-Python

Known for the largest economy in Europe, Germany has a mesmerizing landscape and an enthralling culture. It has become a popular tourist destination across the world. Performing exploratory data analysis of the German housing rental market is helpful for data analysts and the people deciding to live in the country.

This blog will use Python, Panda, and Bokeh to scrape rental housing data using Python, Panda, and Bokeh.

Data Collection

For data collection, we use ImmoScout24, one of the vast and oldest websites comprising more than 72,000 apartments and houses. The website has an API and a page for developers. However, we will scrape real estate data using Python.

Before data collection, ensure to seek permission from the owner. Never use several threads at a time. It will prevent the server from overloading. For debugging your code, use the saved HTML files.

For exploratory data analysis with Python, first, we will get the page data using requests.

exploratory-data-analysis-of-housing-rental-market-in-germany-with-python/Data-Collection

But we need something else because the page has protection against robots. Hence, the Selenium Python library uses a real Chrome browser to save the data and automate the reading pages.

reading-pages

As soon as the code runs, the browser window gets open. Before processing the first page, we added a 30-second delay to ensure that we were not a robot. Within this interval, press the three dots at the right to open the browser setting and disable the loading of images.

The browser gets opened up during requests for the following pages, and there is no robot check for further data. After getting the HTML body, the data extraction of housing rental becomes easy. Use the Inspect button to find the HTML element properties.

properties

We will get these elements in Python using the BeautifulSoup library. The code will extract all the apartment URLs from the page.

We-will-get-these-elements-in-Python-using-the-BeautifulSoup-library

Let’s find the type of data we need.

Data Fields

For each estate object, we will have a page like this. The value and name of the company are a blur.

Data-Fields

Below are the types of data we can get:

Title : In the above picture is a single apartment in Hermsdorf. But, this text needs to be more helpful for data analysis.

Type : The type is Etagenwohnung (apartment situated on the floor).

Kaltmiete or cold price : includes the rental price except the utility costs, like electricity or heating.

Warmmiete or warm price : Apart from heating costs, it includes certain other costs.

Etage or floor : On this page lies text 0-3. Hence a tiny parsing is needed. In Germany, the first floor is considered the first elevated. Hence, we will consider 0 as the ground floor in German. From 0 to 3, we can extract the total number of floors in the building.

Kaution (deposit) : Here, we will find a value of 3-Kalmieten. Specific parsing is

Flasche (area) : It includes the house or apartment area.

Zimmer (room) : It is 1.

You can also extract several other data fields, like, extra rent for a garage, pet allowances, etc. As we performed earlier, the process of HTML parsing is precisely the same. To obtain the property title, we will use the below code.

You-can-also-extract-several-other-data-fields

Similarly, we find for other fields. After running the code for all pages, we will obtain the datasets like this and save them in a CSV format.

Similarly-we-find-for-other-fields

Let’s see what information we can avail.

Data Cleaning & Transformation

The housing data will require cleaning and transformation to obtain a structured format.

We have collected the data from 6 cities in different parts of Germany. It includes Berlin, Frankfurt, Munchen, Koln, Hamburg, and Dresden. We will check for Berlin. We will first load the CSV into the Panda data frame.

the-CSV-into-the-Panda-data-frame

At first, the let's do parsing using Python, and for all missing values, "None" was written in the CSV. As we don't require None, we specify it as 'na_values.' For the separator, we used "." And set 'pd.INT32Dtype' for integer fields, including floor number and price. The output will look like this:

At-first-the-let-s-do-parsing-using-Python

We will then check for dimensionality and the number of NULL values.

We-will-then-check-for-dimensionality-and-the-number-of-NULL-values

The output will appear like this:

The-output-will-appear-like-thi

The above image shows that the total number of properties in Berlin is 3556. Each property has cold and warm prices, number of rooms, area, etc. For 2467 properties, a ‘type’ is missing. There needs to be a floor value for 2200 properties, and so on. Hence, we will require a method to convert test strings like ‘3 Nettokalmieten’ to numeric values.

Basic Analysis

We will use the Pandas method ‘describe’ to get descriptive statistics of the dataset.

Basic-analysis

We removed the ‘property id’ from the results and adjusted the output by adding a ‘thousand’ separator. The Berlin results will appear like this.

We-removed-the-property-id-from-the-results

From the above image, we can see that 3556 properties are available in Berlin. The 50th percentile area for those 3,556 properties is 60 square meters. Its median price is € 1,645. The 75th percentile is €2,271. It indicates that 75% of the property value is cheaper than this value. The average number of rooms is 2.

In the next step, we will make a scatter matrix for specific fields like several rooms, property areas, and prices. We will again use Panda for this

In-the-next-step-we-will-make-a-scatter

The data plotted on the histogram will appear like this.

The-data-plotted-on-the-histogram-will-appear-like-this

For other visualization, we will use the Bokeh library for making beautiful and interactive graphs. First, we will import the necessary files.

For-other-visualization-we-will-use-the-Bokeh

Property Types

We collected data from 67 different cities in Germany, transferred them to a CSV file, and combined them all in a single data frame.

We-collected-data-from-67-different-cities-in-Germany

Now, we will find the property types distribution:

Now-let-s-find-the-property-types-distribution

After replacing the ‘NA’ value with ‘unknown,’ we grouped the property types according to value and sorted the result by the amount. Then, to avoid the blue bars in Matplotlib style, we have specified the color palette. The final output will appear like these:

After-replacing-the-NA-value-with-unknown

From the above image, several properties are without types. However, the apartment situated on the floor is the most popular one. The third and fourth types are under-the-roof and ground-floor apartments.

Now, let’s find the price distribution by type and combine the results in Pandas.

Now-let-s-find-the-price-distribution

The results in the table form will appear like this:

We-can-see-the-result-in-the-table-form

The box-and-whisker bplot gives the visual form of results like this:

The-box-and-whisker-bplot-gives-the-visual-form-of-results-like-this The-box-and-whisker-bplot-gives-the-visual-form-of-results-like-this-2

The penthouses are the most expensive, followed by standard apartments, under-the-roof, and ground-floor apartments.

Property Prices

Price Per Area

We obtain a scatter plot to understand the specific property size available for rent for a specific price. However, it requires only two arrays – X and Y. But, here, we will first create a list of property types and amounts

We will create three different arrays for the specific city. It includes the area in square meters, type, and price.

We-will-create-three-different-arrays-for-the

Here, I substituted the NULL property with ‘Unbekannt,’ which is not required for a scatter plot but for a graph. We will create a linear regression model and train using the data points. It will help in drawing a linear approximation:

It-will-help-in-drawing-a-linear-approximation

We will draw the results:

We-will-draw-the-results

We will put the code in a separate get_figure_price_petr_area method to display different cities on the graph. Combining them in rows and columns, we will draw several Bokeh figures.

figures

The plotted results will look like this:

The-results-are-pretty-interesting

We will visually compare the number of properties available in the market.

Price and Area Histograms

Using a histogram, we will see the prices more compactly. The NumPy histogram method will perform all the calculations

Price-and-Area-Histograms

We used the same approach to draw the graph by mentioning several cities altogether:

We-used-the-same-approach-to-draw-the-graph

The results correlate with the scatter plot.

The-results-correlate-with-the-scatter-plot

Munchen is the most expensive place, where the distribution peak is nearly €1,500, and has two peaks in Berlin. For the square-meter area, we will show the results only for Berlin

Munchen-is-the-most-expensive-place

Several houses and apartments have an area of 30 to 70 square meters. Some properties are smaller than 10 square meters, while some are larger than 250 square meters.

Utility Costs

All apartments have two prices – warm and cold values. We will calculate the difference and design a scatter plot

Utility-Costs All-apartments-have-two-prices

From the above image, we see that the results vary a lot. Different types of houses possess different insulation, heating, etc. The 50 square meter property has nearly 200 Euro utility costs per month. As the area doubles, the costs double.

Deposit

First, we will find out the type of data:

Deposit

The results will appear like this:

The-result-will-appear-like-this

Displaying unique values is too easy. From the above image, we can see that values differ a lot. Some owners place the amount as a digit like ‘585 Euro' while others use text metaphors like '3 MM'.

like-3-MM like-3-MM

The output shows the text descriptions like ‘Drei Nettokaltmieten,’ ‘Zwei Monatsmiete, and so on. For parsing the values, we created two methods that transform a text string into numerical values.

The-output-shows-the-text-descriptions-like

Using these methods, you can do the conversion like this:

Using-these-methods-you-can-do-the-conversion-like-this

Creating a column in the dataset with a deposit-to-price ratio is now easy.

Creating-a-column-in-the-dataset-with-a-deposit-to-price-ratio-is-now-easy

Using this new column, you can easily plot the histogram:

Usin-this-new-column-you-can-easily-plot-the-histogram Usin-this-new-column-you-can-easily-plot-the-histogram-2

Property Owners

Numerous owners prefer to rent their properties alone, while others seek the agency's help. To understand this, we will draw the distribution in the pie chart.

Property-Owners

The publisher groups the data frame in the above code; results are available according to size. For groups, we use different colors.

The Berlin and Munchen cities results will appear like this:

The-Berlin-and-Munchen-cities-results-will-appear-like-this

In Berlin, 8.5% of the real estate listing is by private individuals. In Munchen, it is 27%. A few agencies publish more than 50% of the properties.

Floor Numbers

Several houses and apartments do not have a specific floor number. Hence, we marked it as an ''unknown'' value in such a case by implementing a custom key in Pandas. But, the challenging part is that while performing a Dataframe sort, the custome_key applies by Pandas not to a single value but to the ‘pd.Series’ object. Hence, we need a second method to update the values in the series.

Floor-Numbers

The results for Berlin and Munchen will appear like this:

The-Berlin-and-Munchen-cities-results-will-appear-like-this

We can see that most apartments in both cities lie on the 1st to 5th floors. But, several apartments have 10-20 floors. Exceptionally, an apartment in Berlin lies on the 87th floor.

Geo Visualization

We have to build a histogram before. Here, we will display estate objects on a geographic map. The two types of challenges that we may face are: Getting the coordinates and drawing the map.

Geocoding

We will again check our data. The data frame has different fields like addresses and regions. These fields are available for geocoding.

Geocoding

To find the coordinates, let’s use the GeoPy library.

To-find-the-coordinates-let-s-use-the-GeoPy-library

Although this was very simple, removing “(and)” brackets from the addresses was a significant challenge. Using the ‘Iru_cache’ method, it’s easy to request locations.

Although-this-was-very-simple-removing

Map

For drawing the map, we will use a free Folium library. The map having a mark will display several lines of code:

Map

The code will give a clear, interactive map without any API code:

The-code-will-give-a-clear-interactive-map-without-any-API-code

We will use Folium'sFolium's Circle for each property and group the prices with the help of ''FeatureGroup.''

We-will-use-Folium-sFolium-s

We have also used a heatmap to make the results look much better. The final results will appear like this:

We-have-also-used-a-heatmap-to-make

The real estate objects with more than 5000/m Euro are available evenly. The result is more or less automatic. In Berlin, areas surrounding the center are more expensive.

The-real-estate-objects-with-more-than

Rent Dynamics

How quick is the renting process, and for how long it’s available for rent? This question is unpredictable. But, we will estimate the data by comparing the results of different days. Each property holds a different ID. We will save the data for the same city with an interval of 7 days and display two price histograms for all properties and the other for those removed within seven days.

Rent-Dynamics

To make the bars more readable, let’s add the percentage labels. The result will appear like this:

To-make-the-bars-more-readable

Anomalies Detection

In this step, we will find some anomalies – unusual and non-standard. For this, let’s use the Isolation Forest algorithm. We will use three features – Area, prices, and room numbers.

Anomalies-Detection

In the above code, the algorithm wants only one parameter. It is known as contamination. It determines the outlier's proportion. Let's set it to 1%. We get the result after using the 'fit' method. The 'decision_function' returns the anomaly score. The 'predict' method returns +1 if the object is an inlier and -1 If it is an outlier

In-the-above-code-the-algorithm-wants

The result is:

The-result-is

To explain the results graphically, let’s seek the help of the SHAP Python package.

To-explain-the-results-graphically

Let’s examine the property within the number 3030.

Let-s-examine-the-property-within-the-number-3030 Let-s-examine-the-property-within-the-number-3030

We found that the prices were acceptable. But, the algorithm treated the 211 square meter property area and the number of 5 rooms as unusual. By displaying a scatter plot, let’s check how the algorithm works. Let’s see how the number of rooms and price impact the Shapley values.

We-found-that-the-prices-were-acceptable

The result will appear like this:

The-result-will-appear-like-this

Here, we can see that number of rooms above 4 affects the score the most.

Word Cloud

Here, we will find which word is trendy in the estate titles:

Using a Python WordCloud library, we will do this in several lines of code:

Word-Cloud

The result will appear like this:

Certain words like apartment, room, bright, modern, beautiful, and balcony are famous words we see.

For more information, get in touch with Actowiz Solutions now! You can also reach us for all your web scraping service and mobile app data scraping service requirements.

Social Proof That Converts

Trusted by Global Leaders Across Q-Commerce, Travel, Retail, and FoodTech

Our web scraping expertise is relied on by 4,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.

4,000+ Enterprises Worldwide
50+ Countries Served
20+ Industries
Join 4,000+ companies growing with Actowiz →
Real Results from Real Clients

Hear It Directly from Our Clients

Watch how businesses like yours are using Actowiz data to drive growth.

1 min
★★★★★
"Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing!"
TG
Thomas Galido
Co-Founder / Head of Product at Upright Data Inc.
2 min
★★★★★
"Actowiz delivered impeccable results for our company. Their team ensured data accuracy and on-time delivery. The competitive intelligence completely transformed our pricing strategy."
II
Iulen Ibanez
CEO / Datacy.es
1:30
★★★★★
"What impressed me most was the speed — we went from requirement to production data in under 48 hours. The API integration was seamless and the support team is always responsive."
FC
Febbin Chacko
-Fin, Small Business Owner
icons 4.8/5 Average Rating
icons 50+ Video Testimonials
icons 92% Client Retention
icons 50+ Countries Served

Join 4,000+ Companies Growing with Actowiz

From Zomato to Expedia — see why global leaders trust us with their data.

Why Global Leaders Trust Actowiz

Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.

icons
7+
Years of Experience
Proven track record delivering enterprise-grade web scraping and data intelligence solutions.
icons
4,000+
Projects Delivered
Serving startups to Fortune 500 companies across 50+ countries worldwide.
icons
200+
In-House Experts
Dedicated engineers across scrapers, AI/ML models, APIs, and data quality assurance.
icons
9.2M
Automated Workflows
Running weekly across eCommerce, Quick Commerce, Travel, Real Estate, and Food industries.
icons
270+ TB
Data Transferred
Real-time and batch data scraping at massive scale, across industries globally.
icons
380M+
Pages Crawled Weekly
Scaled infrastructure for comprehensive global data coverage with 99% accuracy.

AI Solutions Engineered
for Your Needs

LLM-Powered Attribute Extraction: High-precision product matching using large language models for accurate data classification.
Advanced Computer Vision: Fine-grained object detection for precise product classification using text and image embeddings.
GPT-Based Analytics Layer: Natural language query-based reporting and visualization for business intelligence.
Human-in-the-Loop AI: Continuous feedback loop to improve AI model accuracy over time.
icons Product Matching icons Attribute Tagging icons Content Optimization icons Sentiment Analysis icons Prompt-Based Reporting

Connect the Dots Across
Your Retail Ecosystem

We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.

icons
Analytics Services
icons
Ad Tech
icons
Price Optimization
icons
Business Consulting
icons
System Integration
icons
Market Research
Become a Partner →

Popular Datasets — Ready to Download

Browse All Datasets →
icons
Amazon
eCommerce
Free 100 rows
icons
Zillow
Real Estate
Free 100 rows
icons
DoorDash
Food Delivery
Free 100 rows
icons
Walmart
Retail
Free 100 rows
icons
Booking.com
Travel
Free 100 rows
icons
Indeed
Jobs
Free 100 rows

Latest Insights & Resources

View All Resources →
thumb
Blog

Consumer Electronics E-commerce: How Best Buy, Amazon & Apple Compete on Data

Inside the consumer electronics category - how Best Buy, Amazon, Apple, Samsung, B&H Photo, and Newegg compete on pricing, launch-day intelligence, and product life-cycle data.

thumb
Case Study

How We Helped a Brand Unlock Location Intelligence for Expansion With Buc-ee's Locations Data Scraping in the USA in 2026

Buc-ee's locations data scraping in the USA in 2026 helps brands unlock location insights, optimize expansion strategies, and gain a competitive edge.

thumb
Report

Mother's Day 2025 E-commerce Insights — What Brands Should Expect in 2026

Mother's Day 2025 E-commerce Insights report — 47,000+ SKUs across 12 platforms. Pricing, discounts, stock-outs & what brands should expect in 2026.

Start Where It Makes Sense for You

Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.

icons
Enterprise
Book a Strategy Call
Custom solutions, dedicated support, volume pricing for large-scale needs.
icons
Growing Brand
Get Free Sample Data
Try before you buy — 500 rows of real data, delivered in 2 hours. No strings.
icons
Just Exploring
View Plans & Pricing
Transparent plans from $500/mo. Find the right fit for your budget and scale.
Get in Touch
Let's Talk About
Your Data Needs
Tell us what data you need — we'll scope it for free and share a sample within hours.
  • icons
    Free Sample in 2 HoursShare your requirement, get 500 rows of real data — no commitment.
  • icons
    Plans from $500/monthFlexible pricing for startups, growing brands, and enterprises.
  • icons
    US-Based SupportOffices in New York & California. Aligned with your timezone.
  • icons
    ISO 9001 & 27001 CertifiedEnterprise-grade security and quality standards.
Request Free Sample Data
Fill the form below — our team will reach out within 2 hours.
+1
Free 500-row sample · No credit card · Response within 2 hours

Request Free Sample Data

Our team will reach out within 2 hours with 500 rows of real data — no credit card required.

+1
Free 500-row sample · No credit card · Response within 2 hours