Exploratory Data Analysis of Housing Rental Market in Germany with Python

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

216.73.216.35

{
  "geoplugin_status":429,
  "geoplugin_message": "Blacklisted due to sending too many requests to geoplugin.net. Consider whitelisting your IP or domain",
  "geoplugin_url": "https://www.geoplugin.com/premium/"

}

http://www.geoplugin.net/php.gp?ip=216.73.216.35

Array
(
    [success] => 
    [message] => You've hit the monthly limit
)

Array
(
    [status] => success
    [country] => United States
    [countryCode] => US
    [region] => OH
    [regionName] => Ohio
    [city] => Columbus
    [zip] => 43215
    [lat] => 39.9625
    [lon] => -83.0061
    [timezone] => America/New_York
    [isp] => Amazon.com
    [org] => Anthropic, PBC
    [as] => AS16509 Amazon.com, Inc.
    [query] => 216.73.216.35
)

Start Your Project

Exploratory-Data-Analysis-of-Housing-Rental-Market-in-Germany-with-Python

Known for the largest economy in Europe, Germany has a mesmerizing landscape and an enthralling culture. It has become a popular tourist destination across the world. Performing exploratory data analysis of the German housing rental market is helpful for data analysts and the people deciding to live in the country.

This blog will use Python, Panda, and Bokeh to scrape rental housing data using Python, Panda, and Bokeh.

Data Collection

For data collection, we use ImmoScout24, one of the vast and oldest websites comprising more than 72,000 apartments and houses. The website has an API and a page for developers. However, we will scrape real estate data using Python.

Before data collection, ensure to seek permission from the owner. Never use several threads at a time. It will prevent the server from overloading. For debugging your code, use the saved HTML files.

For exploratory data analysis with Python, first, we will get the page data using requests.

exploratory-data-analysis-of-housing-rental-market-in-germany-with-python/Data-Collection

But we need something else because the page has protection against robots. Hence, the Selenium Python library uses a real Chrome browser to save the data and automate the reading pages.

reading-pages

As soon as the code runs, the browser window gets open. Before processing the first page, we added a 30-second delay to ensure that we were not a robot. Within this interval, press the three dots at the right to open the browser setting and disable the loading of images.

The browser gets opened up during requests for the following pages, and there is no robot check for further data. After getting the HTML body, the data extraction of housing rental becomes easy. Use the Inspect button to find the HTML element properties.

properties

We will get these elements in Python using the BeautifulSoup library. The code will extract all the apartment URLs from the page.

We-will-get-these-elements-in-Python-using-the-BeautifulSoup-library

Let’s find the type of data we need.

Data Fields

For each estate object, we will have a page like this. The value and name of the company are a blur.

Data-Fields

Below are the types of data we can get:

Title : In the above picture is a single apartment in Hermsdorf. But, this text needs to be more helpful for data analysis.

Type : The type is Etagenwohnung (apartment situated on the floor).

Kaltmiete or cold price : includes the rental price except the utility costs, like electricity or heating.

Warmmiete or warm price : Apart from heating costs, it includes certain other costs.

Etage or floor : On this page lies text 0-3. Hence a tiny parsing is needed. In Germany, the first floor is considered the first elevated. Hence, we will consider 0 as the ground floor in German. From 0 to 3, we can extract the total number of floors in the building.

Kaution (deposit) : Here, we will find a value of 3-Kalmieten. Specific parsing is

Flasche (area) : It includes the house or apartment area.

Zimmer (room) : It is 1.

You can also extract several other data fields, like, extra rent for a garage, pet allowances, etc. As we performed earlier, the process of HTML parsing is precisely the same. To obtain the property title, we will use the below code.

You-can-also-extract-several-other-data-fields

Similarly, we find for other fields. After running the code for all pages, we will obtain the datasets like this and save them in a CSV format.

Similarly-we-find-for-other-fields

Let’s see what information we can avail.

Data Cleaning & Transformation

The housing data will require cleaning and transformation to obtain a structured format.

We have collected the data from 6 cities in different parts of Germany. It includes Berlin, Frankfurt, Munchen, Koln, Hamburg, and Dresden. We will check for Berlin. We will first load the CSV into the Panda data frame.

the-CSV-into-the-Panda-data-frame

At first, the let's do parsing using Python, and for all missing values, "None" was written in the CSV. As we don't require None, we specify it as 'na_values.' For the separator, we used "." And set 'pd.INT32Dtype' for integer fields, including floor number and price. The output will look like this:

At-first-the-let-s-do-parsing-using-Python

We will then check for dimensionality and the number of NULL values.

We-will-then-check-for-dimensionality-and-the-number-of-NULL-values

The output will appear like this:

The-output-will-appear-like-thi

The above image shows that the total number of properties in Berlin is 3556. Each property has cold and warm prices, number of rooms, area, etc. For 2467 properties, a ‘type’ is missing. There needs to be a floor value for 2200 properties, and so on. Hence, we will require a method to convert test strings like ‘3 Nettokalmieten’ to numeric values.

Basic Analysis

We will use the Pandas method ‘describe’ to get descriptive statistics of the dataset.

Basic-analysis

We removed the ‘property id’ from the results and adjusted the output by adding a ‘thousand’ separator. The Berlin results will appear like this.

We-removed-the-property-id-from-the-results

From the above image, we can see that 3556 properties are available in Berlin. The 50th percentile area for those 3,556 properties is 60 square meters. Its median price is € 1,645. The 75th percentile is €2,271. It indicates that 75% of the property value is cheaper than this value. The average number of rooms is 2.

In the next step, we will make a scatter matrix for specific fields like several rooms, property areas, and prices. We will again use Panda for this

In-the-next-step-we-will-make-a-scatter

The data plotted on the histogram will appear like this.

The-data-plotted-on-the-histogram-will-appear-like-this

For other visualization, we will use the Bokeh library for making beautiful and interactive graphs. First, we will import the necessary files.

For-other-visualization-we-will-use-the-Bokeh

Property Types

We collected data from 67 different cities in Germany, transferred them to a CSV file, and combined them all in a single data frame.

We-collected-data-from-67-different-cities-in-Germany

Now, we will find the property types distribution:

Now-let-s-find-the-property-types-distribution

After replacing the ‘NA’ value with ‘unknown,’ we grouped the property types according to value and sorted the result by the amount. Then, to avoid the blue bars in Matplotlib style, we have specified the color palette. The final output will appear like these:

After-replacing-the-NA-value-with-unknown

From the above image, several properties are without types. However, the apartment situated on the floor is the most popular one. The third and fourth types are under-the-roof and ground-floor apartments.

Now, let’s find the price distribution by type and combine the results in Pandas.

Now-let-s-find-the-price-distribution

The results in the table form will appear like this:

We-can-see-the-result-in-the-table-form

The box-and-whisker bplot gives the visual form of results like this:

The-box-and-whisker-bplot-gives-the-visual-form-of-results-like-this

The-box-and-whisker-bplot-gives-the-visual-form-of-results-like-this-2

The penthouses are the most expensive, followed by standard apartments, under-the-roof, and ground-floor apartments.

Property Prices

Price Per Area

We obtain a scatter plot to understand the specific property size available for rent for a specific price. However, it requires only two arrays – X and Y. But, here, we will first create a list of property types and amounts

We will create three different arrays for the specific city. It includes the area in square meters, type, and price.

We-will-create-three-different-arrays-for-the

Here, I substituted the NULL property with ‘Unbekannt,’ which is not required for a scatter plot but for a graph. We will create a linear regression model and train using the data points. It will help in drawing a linear approximation:

It-will-help-in-drawing-a-linear-approximation

We will draw the results:

We-will-draw-the-results

We will put the code in a separate get_figure_price_petr_area method to display different cities on the graph. Combining them in rows and columns, we will draw several Bokeh figures.

figures

The plotted results will look like this:

The-results-are-pretty-interesting

We will visually compare the number of properties available in the market.

Price and Area Histograms

Using a histogram, we will see the prices more compactly. The NumPy histogram method will perform all the calculations

Price-and-Area-Histograms

We used the same approach to draw the graph by mentioning several cities altogether:

We-used-the-same-approach-to-draw-the-graph

The results correlate with the scatter plot.

The-results-correlate-with-the-scatter-plot

Munchen is the most expensive place, where the distribution peak is nearly €1,500, and has two peaks in Berlin. For the square-meter area, we will show the results only for Berlin

Munchen-is-the-most-expensive-place

Several houses and apartments have an area of 30 to 70 square meters. Some properties are smaller than 10 square meters, while some are larger than 250 square meters.

Utility Costs

All apartments have two prices – warm and cold values. We will calculate the difference and design a scatter plot

Utility-Costs

All-apartments-have-two-prices

From the above image, we see that the results vary a lot. Different types of houses possess different insulation, heating, etc. The 50 square meter property has nearly 200 Euro utility costs per month. As the area doubles, the costs double.

Deposit

First, we will find out the type of data:

Deposit

The results will appear like this:

The-result-will-appear-like-this

Displaying unique values is too easy. From the above image, we can see that values differ a lot. Some owners place the amount as a digit like ‘585 Euro' while others use text metaphors like '3 MM'.

like-3-MM

like-3-MM

The output shows the text descriptions like ‘Drei Nettokaltmieten,’ ‘Zwei Monatsmiete, and so on. For parsing the values, we created two methods that transform a text string into numerical values.

The-output-shows-the-text-descriptions-like

Using these methods, you can do the conversion like this:

Using-these-methods-you-can-do-the-conversion-like-this

Creating a column in the dataset with a deposit-to-price ratio is now easy.

Creating-a-column-in-the-dataset-with-a-deposit-to-price-ratio-is-now-easy

Using this new column, you can easily plot the histogram:

Usin-this-new-column-you-can-easily-plot-the-histogram

Usin-this-new-column-you-can-easily-plot-the-histogram-2

Property Owners

Numerous owners prefer to rent their properties alone, while others seek the agency's help. To understand this, we will draw the distribution in the pie chart.

Property-Owners

The publisher groups the data frame in the above code; results are available according to size. For groups, we use different colors.

The Berlin and Munchen cities results will appear like this:

The-Berlin-and-Munchen-cities-results-will-appear-like-this

In Berlin, 8.5% of the real estate listing is by private individuals. In Munchen, it is 27%. A few agencies publish more than 50% of the properties.

Floor Numbers

Several houses and apartments do not have a specific floor number. Hence, we marked it as an ''unknown'' value in such a case by implementing a custom key in Pandas. But, the challenging part is that while performing a Dataframe sort, the custome_key applies by Pandas not to a single value but to the ‘pd.Series’ object. Hence, we need a second method to update the values in the series.

Floor-Numbers

The results for Berlin and Munchen will appear like this:

The-Berlin-and-Munchen-cities-results-will-appear-like-this

We can see that most apartments in both cities lie on the 1st to 5th floors. But, several apartments have 10-20 floors. Exceptionally, an apartment in Berlin lies on the 87th floor.

Geo Visualization

We have to build a histogram before. Here, we will display estate objects on a geographic map. The two types of challenges that we may face are: Getting the coordinates and drawing the map.

Geocoding

We will again check our data. The data frame has different fields like addresses and regions. These fields are available for geocoding.

Geocoding

To find the coordinates, let’s use the GeoPy library.

To-find-the-coordinates-let-s-use-the-GeoPy-library

Although this was very simple, removing “(and)” brackets from the addresses was a significant challenge. Using the ‘Iru_cache’ method, it’s easy to request locations.

Although-this-was-very-simple-removing

Map

For drawing the map, we will use a free Folium library. The map having a mark will display several lines of code:

Map

The code will give a clear, interactive map without any API code:

The-code-will-give-a-clear-interactive-map-without-any-API-code

We will use Folium'sFolium's Circle for each property and group the prices with the help of ''FeatureGroup.''

We-will-use-Folium-sFolium-s

We have also used a heatmap to make the results look much better. The final results will appear like this:

We-have-also-used-a-heatmap-to-make

The real estate objects with more than 5000/m Euro are available evenly. The result is more or less automatic. In Berlin, areas surrounding the center are more expensive.

The-real-estate-objects-with-more-than

Rent Dynamics

How quick is the renting process, and for how long it’s available for rent? This question is unpredictable. But, we will estimate the data by comparing the results of different days. Each property holds a different ID. We will save the data for the same city with an interval of 7 days and display two price histograms for all properties and the other for those removed within seven days.

Rent-Dynamics

To make the bars more readable, let’s add the percentage labels. The result will appear like this:

To-make-the-bars-more-readable

Anomalies Detection

In this step, we will find some anomalies – unusual and non-standard. For this, let’s use the Isolation Forest algorithm. We will use three features – Area, prices, and room numbers.

Anomalies-Detection

In the above code, the algorithm wants only one parameter. It is known as contamination. It determines the outlier's proportion. Let's set it to 1%. We get the result after using the 'fit' method. The 'decision_function' returns the anomaly score. The 'predict' method returns +1 if the object is an inlier and -1 If it is an outlier

In-the-above-code-the-algorithm-wants

The result is:

The-result-is

To explain the results graphically, let’s seek the help of the SHAP Python package.

To-explain-the-results-graphically

Let’s examine the property within the number 3030.

Let-s-examine-the-property-within-the-number-3030

Let-s-examine-the-property-within-the-number-3030

We found that the prices were acceptable. But, the algorithm treated the 211 square meter property area and the number of 5 rooms as unusual. By displaying a scatter plot, let’s check how the algorithm works. Let’s see how the number of rooms and price impact the Shapley values.

We-found-that-the-prices-were-acceptable

The result will appear like this:

The-result-will-appear-like-this

Here, we can see that number of rooms above 4 affects the score the most.

Word Cloud

Here, we will find which word is trendy in the estate titles:

Using a Python WordCloud library, we will do this in several lines of code:

Word-Cloud

The result will appear like this:

Certain words like apartment, room, bright, modern, beautiful, and balcony are famous words we see.

For more information, get in touch with Actowiz Solutions now! You can also reach us for all your web scraping service and mobile app data scraping service requirements.

216.73.216.35

{
  "geoplugin_status":429,
  "geoplugin_message": "Blacklisted due to sending too many requests to geoplugin.net. Consider whitelisting your IP or domain",
  "geoplugin_url": "https://www.geoplugin.com/premium/"

}

http://www.geoplugin.net/php.gp?ip=216.73.216.35

Array
(
    [success] => 
    [message] => You've hit the monthly limit
)

Array
(
    [status] => success
    [country] => United States
    [countryCode] => US
    [region] => OH
    [regionName] => Ohio
    [city] => Columbus
    [zip] => 43215
    [lat] => 39.9625
    [lon] => -83.0061
    [timezone] => America/New_York
    [isp] => Amazon.com
    [org] => Anthropic, PBC
    [as] => AS16509 Amazon.com, Inc.
    [query] => 216.73.216.35
)

Start Your Project

US

Additional Trust Elements

✨ "1000+ Projects Delivered Globally"

⭐ "Rated 4.9/5 on Google & G2"

🔒 "Your data is secure with us. NDA available."

💬 "Average Response Time: Under 12 hours"

From Raw Data to Real-Time Decisions

All in One Pipeline

Scrape → Structure → Analyze → Visualize

Explore Solutions Get a Custom Demo

Look Back Analyze historical data to discover patterns, anomalies, and shifts in customer behavior.

Find Insights Use AI to connect data points and uncover market changes. Meanwhile.

Move Forward Predict demand, price shifts, and future opportunities across geographies.

Trusted by Global Leaders – Secured by International Standards

Industry:

Coffee / Beverage / D2C

Result

2x Faster

Smarter product targeting

★★★★★

“Actowiz Solutions has been instrumental in optimizing our data scraping processes. Their services have provided us with valuable insights into our customer preferences, helping us stay ahead of the competition.”

Operations Manager, Beanly Coffee

✓ Competitive insights from multiple platforms

Industry:

Real Estate

Result

2x Faster

Real-time RERA insights for 20+ states

★★★★★

“Actowiz Solutions provided exceptional RERA Website Data Scraping Solution Service across PAN India, ensuring we received accurate and up-to-date real estate data for our analysis.”

Data Analyst, Aditya Birla Group

✓ Boosted data acquisition speed by 3×

Industry:

Organic Grocery / FMCG

Result

Improved

competitive benchmarking

★★★★★

“With Actowiz Solutions' data scraping, we’ve gained a clear edge in tracking product availability and pricing across various platforms. Their service has been a key to improving our market intelligence.”

Product Manager, 24Mantra Organic

✓ Real-time SKU-level tracking

Industry:

Quick Commerce

Result

2x Faster

Inventory Decisions

★★★★★

“Actowiz Solutions has greatly helped us monitor product availability from top three Quick Commerce brands. Their real-time data and accurate insights have streamlined our inventory management and decision-making process. Highly recommended!”

Aarav Shah, Senior Data Analyst, Mensa Brands

✓ 28% product availability accuracy

✓ Reduced OOS by 34% in 3 weeks

Industry:

Quick Commerce

Result

3x Faster

improvement in operational efficiency

★★★★★

“Actowiz Solutions' data scraping services have helped streamline our processes and improve our operational efficiency. Their expertise has provided us with actionable data to enhance our market positioning.”

Business Development Lead,Organic Tattva

✓ Weekly competitor pricing feeds

Industry:

Beverage / D2C

Result

Faster

Trend Detection

★★★★★

“The data scraping services offered by Actowiz Solutions have been crucial in refining our strategies. They have significantly improved our ability to analyze and respond to market trends quickly.”

Marketing Director, Sleepyowl Coffee

Boosted marketing responsiveness

Industry:

Quick Commerce

Result

Enhanced

stock tracking across SKUs

★★★★★

“Actowiz Solutions provided accurate Product Availability and Ranking Data Collection from 3 Quick Commerce Applications, improving our product visibility and stock management.”

Growth Analyst, TheBakersDozen.in

✓ Improved rank visibility of top products

Trusted by Industry Leaders Worldwide

Real results from real businesses using Actowiz Solutions

★★★★★

'Great value for the money. The expertise you get vs. what you pay makes this a no brainer"

Thomas Galido

Co-Founder / Head of Product at Upright Data Inc.

2 min

★★★★★

“I strongly recommend Actowiz Solutions for their outstanding web scraping services. Their team delivered impeccable results with a nice price, ensuring data on time.”

Iulen Ibanez

CEO / Datacy.es

1 min

★★★★★

“Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing highly recommended!”

Febbin Chacko

-Fin, Small Business Owner

1 min

See Actowiz in Action – Real-Time Scraping Dashboard + Success Insights

Blinkit (Delhi NCR)

In Stock
₹524

Amazon USA

Price Drop + 12 min
in 6 hrs across Lel.6

Appzon AirPdos Pro

Price
Drop −12 thr

Zepto (Mumbai)

Improved inventory
visibility & palniring

Monitor Prices, Availability & Trends -Live Across Regions

Actowiz's real-time scraping dashboard helps you monitor stock levels, delivery times, and price drops across Blinkit, Amazon: Zepto & more.

✔ Scraped Data: Price inights Top-slling SKUs

Request Demo Access icon

Our Data Drives Impact - Real Client Stories

Blinkit | India (Relail Partner)

"Actow's helped us reduce out of ststack incidents by 23% within 6 weeks"

✔ Scraped Data, SKU availability, delivery time

US Electronics Seller (Amazon - Walmart)

With hourly price monitoring, we aligned promotions with competitors, drove 17%

✔ Scraped Data, SKU availability, delivery time

Zepto Q Commerce Brand

"Actow's helped us reduce out of ststack incidents by 23% within 6 weeks"

✔ Scraped Data, SKU availability, delivery time

Actowiz Insights Hub

Actionable Blogs, Real Case Studies, and Visual Data Stories -All in One Place

All

Blog

Case Studies

Infographics

Report

July 30, 2025

Why WebMD Drug Information Scraping Is Essential for Extracting Accurate Pharmaceutical Data?

Discover why WebMD Drug Information Scraping is vital for extracting accurate pharmaceutical data, dosage details, side effects, and drug interactions.

How U.S. Startups Leveraged the Lazada Grocery Dataset for Smarter Delivery Operations & Faster Market Penetration

Discover how U.S. startups used the Lazada grocery dataset to enhance delivery operations and speed up market entry with real-time retail and logistics insights.

Raksha Bandhan & Independence Day 2025: Travel Price Surge or Discount Season?

Explore how Raksha Bandhan & Independence Day 2025 affect airfare & hotel rates using Actowiz Solutions' travel scraping tools. Data reveals price hikes or discounts.

Scraping Food Delivery Data for Smart Digital Menu Systems in India

Discover how scraping food delivery data powers Smart Digital Menu Systems in India with real-time pricing, trends, and customer preference insights.

July 30, 2025

Why WebMD Drug Information Scraping Is Essential for Extracting Accurate Pharmaceutical Data?

Discover why WebMD Drug Information Scraping is vital for extracting accurate pharmaceutical data, dosage details, side effects, and drug interactions.

July 30, 2025

Tata CLiQ Personal Care Product Data Scraping - How to Extract Actionable Insights Easily

Tata CLiQ Personal Care Product Data Scraping helps brands extract insights on pricing, reviews & trends to boost product strategies and online visibility.

July 30, 2025

Amazon Seller Competitor Review Analysis - The Secret to Outselling Your Rivals

Boost sales with Amazon Seller Competitor Review Analysis—uncover insights from rival reviews to improve product strategy and outperform competition.

Read More

How U.S. Startups Leveraged the Lazada Grocery Dataset for Smarter Delivery Operations & Faster Market Penetration

Discover how U.S. startups used the Lazada grocery dataset to enhance delivery operations and speed up market entry with real-time retail and logistics insights.

Raksha Bandhan & Independence Day 2025: Travel Price Surge or Discount Season?

Explore how Raksha Bandhan & Independence Day 2025 affect airfare & hotel rates using Actowiz Solutions' travel scraping tools. Data reveals price hikes or discounts.

Competitive Benchmarking Using Amazon eCommerce Datasets

Discover how Amazon eCommerce Datasets enable competitive benchmarking, offering deep insights into pricing, trends, and product performance analysis.

Read More

Real-Time Price Monitoring & Benchmarking on Amazon & Walmart for Smarter eCommerce

Use real-time price monitoring to benchmark Amazon & Walmart prices, avoid MAP violations, and power your eCommerce intelligence with Actowiz Solutions.

Unlock Growth in India’s Booming Regional Markets with Hyperlocal Data

Discover hyperlocal insights from India’s regional markets with real-time data extraction for pricing, delivery trends, SKU tracking & brand analysis.

Outpace Competition with Real-Time Quick-Commerce Data Intelligence

Actowiz delivers Quick-Commerce Data Intelligence with real-time insights on pricing, stock, and delivery—driving growth, efficiency, and profit margins.

Read More

Scraping Food Delivery Data for Smart Digital Menu Systems in India

Discover how scraping food delivery data powers Smart Digital Menu Systems in India with real-time pricing, trends, and customer preference insights.

99acres and MagicBricks Data Extraction - Real Estate Market Trends in India

Explore MagicBricks data extraction and 99acres insights to analyze real estate market trends in India, from pricing shifts to demand patterns across cities.

Real-Time Used Car Dataset from Carfax for Accident-Vehicle Insights

Explore how a Real-Time Used Car Dataset from Carfax enables accident-vehicle tracking, helping dealers, insurers, and buyers make informed, data-driven decisions.

Read More