Start Your Project with Us

Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.

  • Any feature, you ask, we develop
  • 24x7 support worldwide
  • Real-time performance dashboard
  • Complete transparency
  • Dedicated account manager
  • Customized solutions to fulfill data scraping goals
How-to-Scrape-Data-for-Your-Apartment-Rent-Pricing-App-A-Step-by-Step-Guide

Welcome to the first blog in our series, "Creating an Apartment Rent Price end-to-end app." If you're new to this project, we recommend reading the overview blog first to understand the process better. In this initial step, we'll focus on securing a suitable and relevant dataset, a common initial step in utmost Data Science projects. We'll achieve this by utilizing web scraping to gather the necessary data. If web scraping is unfamiliar to you, don't worry! Skimming through a web scraping tutorial we prepared earlier to familiarize yourself with the concept before diving in. Let's start creating our valuable dataset for the Apartment Rent Pricing App!

TL;DR

Here’s a link given to code available on GitHub:

Getting Started

Getting Started with Web Scraping: Gathering Apartment Listings from https://www.propertypro.ng/

In this initial phase of building our Apartment Rent Pricing App, we will use web scraping to extract valuable apartment listings from the website https://www.propertypro.ng/. We aim to save this data in CSV files for future use. We will focus on the website's layout and structure to extract the required information efficiently. Please look at the website to familiarize yourself with its appearance before we begin the web scrapinga process. Let's prepare to gather the loot and save it for further use in our project!

Getting-Started

To gather the relevant apartment data for our project, we will follow a series of steps on the website https://www.propertypro.ng/. First, we'll click on the "Rent" option, then type "Lagos" in the search bar located below the "Rent" section. Afterward, we will click the "Type" dropdown menu and select "Flats and Apartments." Finally, we'll click on the "Search" button. The resulting page will display the filtered apartment listings, serving as our primary data source for the Apartment Rent Pricing App development.

To-gather-the-relevant-apartment-data-for-our-project

Upon scrolling down, you will encounter a display similar to the following:

Upon-scrolling-down,-you-will-encounter-a-display-similar-to-the-following

Exploring the Website Layout and Required Data Fields

As we examine the layout of the website, we can observe that each apartment listing consists of the following essential information:

  • Title
  • Address
  • House perks (encased in purple)
  • Description with a "Read more" link
  • Details of bedrooms, baths, and toilets

We aim to retrieve these specific details for each apartment listing on the site.

We will leverage the Python Requests library to retrieve the webpage data and then use BeautifulSoup to parse the HTML and extract the desired information to accomplish this task. Let's start by importing the necessary libraries for our web scraping process.

We-will-leverage-the-Python-Requests-library

Importing Essential Libraries and Retrieving Webpage Data

As we embark on the web scraping process, it's always good to have Numpy and Pandas on hand, as they might prove helpful at any point in our project, even if not immediately.

Now, to begin, let's copy a URL of the present page on a website and pass that into get function of Requests library. Once we have retrieved the contents of the page, we will proceed to parse it using BeautifulSoup:

Importing-Essential-Libraries-and-Retrieving-Webpage-Data

Printing out soup will allow us to check its content and structure. Here's what we would get:

Printing-out-soup-will-allow-us-to-check-its-content-and-structure.-Here's-what-we-would-get

To inspect the page elements and figure out how to parse the data, follow these steps:

Please return to the website's apartment rents listings page (the one we opened earlier).

Right-click somewhere on a page.

Click on "Inspect" from context menu. It will open a browser's developer tools.

The page should now split into two sections, with the left side displaying the HTML code of the page and the right side highlighting the corresponding elements as you hover over them.

By inspecting the page, you can identify the HTML tags and classes that encapsulate the relevant information, such as the apartment title, address, house perks, description, and details of bedrooms, baths, and toilets. This information will guide us in writing the code to extract the data from each apartment listing on the webpage.

By-inspecting-the-page,-you-can-identify-the-HTML

Use the inspect tool effectively, follow these steps:

  • Please open the website's apartment rents listings page (the one we opened earlier).
  • Right-click anywhere on the page.
  • Click on "Inspect" from the context menu to open the browser's developer tools.
  • In the developer tools, you will see two sections - the HTML code on the left and the page preview on the right.
  • Look for a small arrow pointing at the top-left corner of the right section of the developer tools.
  • Click on the arrow to activate the element selection mode.
  • Once the element selection mode is active, move your cursor over the data you want to extract on the webpage.
  • As you hover over the data, you will notice that the corresponding HTML elements get highlighted in blue on the left side of the developer tools. This highlights the part of the HTML code responsible for displaying the selected data on the webpage.

Following these steps, you can identify the specific HTML elements and classes associated with the data you want to scrape. This information will guide us in creating the necessary code to extract the relevant details from the apartment listings.

Following-these-steps,-you-can-identify-the-specific-HTML

Upon clicking on a dropdown icon alongside the selected data in the developer tools, you can explore further and observe the HTML tags that encapsulate each piece of information:

  • The address information is enclosed within an h4 tag.
  • A pricing is enclosed within the h3 tag and within a div tag.
  • The description is enclosed within a div tag.
  • Other relevant data, such as house perks, bedroom, bath, and toilet details, may also have specific HTML tags.

Understanding the HTML structure of the webpage is crucial as it enables us to target the correct elements during the web scraping process. With this knowledge, we can write the code that extracts the required data from each apartment listing on the website.

Pull Out the Individual Specifications

Let's proceed with building the code to extract all the individual specifications of each apartment listing. We'll use the information we gathered from inspecting the webpage to target the relevant HTML elements. Here's how we can achieve it:

listing_divs = soup.select('div[class=single-room-sale\ listings-property]')

In the provided code, you are using a select function from BeautifulSoup to retrieve all the div elements with a class name with the value "single-room-sale listings-property," you are saving to the list named listing_divs. Let's proceed with checking the number of elements in the listing_divs list using the len function:

print("Number of apartment listings on a page:", len(listing_divs))

Here are the results:

Number of apartment listings on a page: 20

You are correct in your observation. The fact that there are 20 elements in listing_divs suggests that there are 20 apartment listings on the webpage. Each element represents an individual apartment listing along with its details.

Now, let's extract the features we need from the first element in listing_divs:

listing_divs[0]

This provides us the given output:

This-provides-us-the-given-output

Certainly! By examining the HTML structure and identifying the relevant tags, we can now begin extracting the features we need from each apartment listing. Let's extract them one by one:

listing_divs[0].select('h4')[0].text

Here's the code snippet to retrieve the address, which is enclosed in an h4 tag, from the first element in the listing_divs list:

The code above will output the address of the first apartment listing in the listing_divs list. You can repeat this process to extract other features such as price, description, and details of bedrooms, baths, and toilets from the same listing.

'Ikota Lekki Lagos'

That’s very easy! After that, let’s try and scrape the pricing tag:

listing_divs[0].select('h3[class*=listings-price]')[0].text.strip()

Here's the code snippet to retrieve the price, which is enclosed in an h3 tag with a class containing the name "listings-price." We use the strip() function to remove any leading and trailing whitespace around the extracted value:

'N 2,800,000'

Next is total bedrooms, toilets & bathrooms:

listing_divs[0].select('div[class*=fur-areea]')
[0].text.strip().split('\n')

Great! Let's proceed with the code snippet to retrieve all three features (house perks, bedroom details, bath details, and toilet details) from the div tag with a class name containing "fur-areea." We'll use the text function to extract the entire content of the div as a single string. Then, we'll use strip() to remove leading and trailing whitespace, and finally, we'll split the string using newline escape characters to separate the individual features:

['3 beds', '3 baths', '4 Toilets']

The last characteristic to scrape is description data.

The-last-characteristic-to-scrape-is-description-data.

Absolutely! The "Serviced" theme enclosed in purple and the line directly under it can provide valuable additional information about the apartment that might not be evident from the price and number of rooms alone. Let's proceed with retrieving the "Serviced" theme from the first apartment listing:

listing_divs[7].select('div[class*=furnished-btn]')
[0].text.replace('\n', ' ').strip()

Let's retrieve the "Serviced" theme from the div tag with a class name containing "furnished-btn" as you mentioned. We'll replace newline escape characters with a single space and then strip off any leading and trailing whitespace:

'Serviced Newly Built'

Let's proceed with cleaning up the line directly under the "Serviced" and "Newly Built" themes. We'll remove any leading and trailing whitespace to get a cleaner version of the information:

listing_divs[7].select('div[class*=result-list-details]')
[0].p.text.replace('Read more', '').replace('FOR RENT:', '').strip()

Let's proceed with extracting the information from the div tag with a class name containing "result-list-details." We will then retrieve the string from the p tag within the div and clean it up by removing "Read more" and "FOR RENT:" using empty strings. Finally, we'll strip off any extra whitespace:

3 bedroom Flat/Apartment for rent Old Ikoyi Ikoyi Lagos...

Let's put all the code together to extract all the necessary information from each apartment listing on the webpage. We'll test it further to ensure it works as expected:

Test Everything Altogether

Check the code given below:

Test-Everything-Altogether

Great explanation! Now let's put everything together and walk through the code step by step:

Great-explanation!-Now-let's-put-everything-together-and-walk-through-the-code-step-by-step

The code above will output a DataFrame containing the extracted information for all 20 apartment listings on the webpage. The DataFrame will have columns for Address, Price, Rooms (beds, baths, and toilets), and Description & Extra details. Now you can further analyze or manipulate the data as needed using Pandas!

The-code-above-will-output-a-DataFrame-containing

Fantastic! Let's wrap up the task by creating a dynamic function that retrieves apartment listings based on the city's name. The function will convert the data into a Pandas DataFrame and save it as a CSV file locally. Here's the dynamic function:

Dynamic Function for Retrieving Apartment List Data

Dynamic-Function-for-Retrieving-Apartment-List-Data

Sure, here's a high-level overview of the function parse_listing_data without any code:

The parse_listing_data function is designed to scrape apartment listings data from a web page based on the provided location (city). It allows for an optional parameter max_price, which acts as a filter to exclude overpriced apartments. The function also accepts the number of apartment listings num_listings that the user wants to retrieve.

Here's a summary of the function's workflow:

Initialize an empty list called all_listings_data and set page_num to 0.

Enter a while loop to iterate until the number of pages scraped reaches the desired num_listings. If the num_listings is 200, it will scrape 10 pages (assuming each page contains 20 apartment listings).

Build the URL for each page by combining the base URL with the location, max_price, and page_num using string concatenation.

Use the requests.get function to retrieve the HTML data from the URL and parse it using BeautifulSoup.

Select all the apartment listings on the page. The loop terminates if there are no listings (length of listing_divs is 0).

Loop through each listing and extract its address, price, number of bedrooms, bathrooms, toilets, and description.

Append the extracted data for each listing to the all_listings_data list.

Increment page_num by 1 to move to the next page in the loop.

After scraping the desired number of pages, convert all_listings_data into a Pandas DataFrame with appropriate column names.

Save the DataFrame as a CSV file, with the filename suffixed by the provided location name.

Return the DataFrame as the output of the function.

Following this process, the function can dynamically scrape apartment listings data for the specified location, filter out high-priced apartments if required, and store the data in a Pandas DataFrame and CSV file for further analysis or usage.

Use the Function for getting Data for Various Cities

Here's the high-level overview of how to test the function for the city of Lagos without any code:

Define the necessary arguments: location (set to "Lagos"), max_price (optional, set to a specific maximum price to filter out overpriced apartments), and num_listings (the number of apartment listings you want to retrieve).

Call the parse_listing_data function, passing the above-defined arguments for Lagos.

The function will scrape apartment listings data for Lagos based on the specified parameters (max_price and num_listings).

The function will return a Pandas DataFrame containing the scraped data.

You can display the DataFrame to see the retrieved apartment listings data for Lagos.

By following these steps, you can easily test the function for different cities by providing the appropriate location name and other relevant arguments.

lagos_data = parse_listing_data('lagos', 2500000, 5000)
lagos_data

Here, we pass in lagos as our preferred location, 2.5 million naira as our maximum price, and 5000 as the number of rows we want. Here’s our result:

Here,-we-pass-in-lagos-as-our-preferred-location

Here's the high-level overview of how to test the function for the city of Ibadan without any code:

Define the necessary arguments: location (set to "Ibadan"), max_price (optional, set to a specific maximum price to filter out overpriced apartments), and num_listings (the number of apartment listings you want to retrieve).

Call the parse_listing_data function, passing the above-defined arguments for Ibadan.

The function will scrape apartment listings data for Ibadan based on the specified parameters (max_price and num_listings).

The function will return a Pandas DataFrame containing the scraped data.

You can display the DataFrame to see the retrieved apartment listings data for Ibadan.

By following these steps, you can easily test the function for different cities by providing the appropriate location name and other relevant arguments. The function will automatically save the CSV file with the data locally, as specified in the function code.

ibadan_data = parse_listing_data('oyo', 3000000, 2000)
ibadan_data

Let's test the function for the city of Oyo with a maximum apartment price of 3 million Naira and 2000 rows of data:

Assuming the function is defined as mentioned before, we can use it to retrieve apartment listings data for Oyo as follows:

Let's-test-the-function-for-the-city-of-Oyo-with-a-maximum

Let's test the function for the cities of Abuja, Ogun, and Port Harcourt:

Assuming the function is defined as mentioned before, we can use it to retrieve apartment listings data for these cities as follows:

Let's-test-the-function-for-the-cities-of-Abuja

Fantastic! You have successfully tested the function for different cities, and here are the results:

  • Abuja: Approximately 700 rows of apartment listings data.
  • Ogun: Approximately 190 rows of apartment listings data.
  • Port Harcourt: Approximately 100 rows of apartment listings data.
  • Ibadan: Approximately 512 rows of apartment listings data.

Congratulations on completing the task! You have built a dynamic web scraping function that retrieves apartment listings data for various cities, filters based on price if needed, and stores the data in a Pandas DataFrame and CSV file. Cheers to your accomplishment! If you have any more questions or need further assistance, feel free to ask. Well done!

Conclusion

In this comprehensive tutorial, we walked through the step-by-step process of building apartment listings datasets by web scraping data from Nigeria's top real estate pricings website. The code for this tutorial is available on GitHub for easy reference.

The next crucial phase is Data Wrangling, where we will focus on cleaning and preparing the scraped data into a format suitable for further analysis. This step typically consumes a significant portion of our development time, accounting for over 40% of the total effort.

Thank you for following along! If you have any questions or need assistance, feel free to contact Actowiz Solutions. We offer services for mobile app scraping, instant data scraper and web scraping service to cater to your specific requirements.

Until next time, happy coding and data wrangling! Goodbye, folks!

Recent Blog

View More

How to Leverage Google Earth Pool House Scraping to Get Real Estate Insights?

Harness Google Earth Pool House scraping for valuable real estate insights, optimizing property listings and investment strategies effectively.

How to Scrape Supermarket and Multi-Department Store Data from Kroger?

Unlock insights by scraping Kroger's supermarket and multi-department store data using advanced web scraping techniques.

Research And Report

View More

Scrape Zara Stores in Germany

Research report on scraping Zara store locations in Germany, detailing methods, challenges, and findings for data extraction.

Battle of the Giants: Flipkart's Big Billion Days vs. Amazon's Great Indian Festival

In this Research Report, we scrutinized the pricing dynamics and discount mechanisms of both e-commerce giants across essential product categories.

Case Studies

View More

Case Study - Empowering Price Integrity with Actowiz Solutions' MAP Monitoring Tools

This case study shows how Actowiz Solutions' tools facilitated proactive MAP violation prevention, safeguarding ABC Electronics' brand reputation and value.

Case Study - Revolutionizing Retail Competitiveness with Actowiz Solutions' Big Data Solutions

This case study exemplifies the power of leveraging advanced technology for strategic decision-making in the highly competitive retail sector.

Infographics

View More

Unleash the power of e-commerce data scraping

Leverage the power of e-commerce data scraping to access valuable insights for informed decisions and strategic growth. Maximize your competitive advantage by unlocking crucial information and staying ahead in the dynamic world of online commerce.

How do websites Thwart Scraping Attempts?

Websites thwart scraping content through various means such as implementing CAPTCHA challenges, IP address blocking, dynamic website rendering, and employing anti-scraping techniques within their code to detect and block automated bots.