Start Your Project with Us

Whatever your project size is, we will handle it well with all the standards fulfilled! We are here to give 100% satisfaction.

  • Any feature, you ask, we develop
  • 24x7 support worldwide
  • Real-time performance dashboard
  • Complete transparency
  • Dedicated account manager
  • Customized solutions to fulfill data scraping goals
How-to-Find-a-Grocery-Delivery-Slot-in-a-Smart-Way.jpg

In this blog, we will use Python, Twilio, and Heroku to extract data from a grocery website API and find a text notification while slots are available

We live in extraordinary times.

We-live-in-extraordinary-times.jpg

And with extraordinary times come different challenges. One such challenge was preserving grocery supply chains with millions of people under lockdown due to Covid-19. For vulnerable people who are isolated or unable to go to the supermarket physically, the only accessible option is booking a supermarket delivery time slot online. Though, with a massive demand for these services, it has become disreputably challenging to get an accessible slot- leaving many people nonstop logging in to check the slots.

That got us thinking about- the ever-increasing number of problems we face and how we could utilize Python to automate this procedure for me.

Reviewing a Grocery Website

The initial step towards our objective of some 'automated time delivery slot checker' is finding how we could programmatically scrape data that we need from a grocery website.

After choosing ASDA as our grocery site, making an account, as well as inputting the delivery postcode, we arrive on a delivery slot page, given below.

Reviewing-a-Grocery-Website.jpg

Here we could see a precisely made table of times, dates, and accessibility of every slot. Naturally, all the slots are presently showing 'Sold Out.' However, we have prominence on the targeted data we need to get with the tool.

If you've done any data scraping before or used with web development, you'll get well-versed with in-built DevTools functionality to most important browsers. For those who are not, there is a set of tools that permit users to examine the webpage and study the CSS, HTML, JavaScript, and critically for project- metadata associated to network requests getting made to as well as from the server and webpages. The following step is perhaps the most important one.

following-step-is-perhaps-the-most-important-one.jpg following-step-is-perhaps-the-most-important-one-2.jpg

With DevTools windows visible, we could start to see what's happening behind the scenes in the webpages to allow us to observe an updated table to do slot availability. Navigating a 'Network' tab of the DevTools window, we get access to all network requests made by the website to find the newest data displayed. Refreshing a webpage will produce a listing of requests, one of which must have the key to seeing where the slot accessibility data is coming from.

slot-accessibility-data-is-coming-from.jpg slot-accessibility-data-is-coming-from-2.jpg

This listing may look a bit confusing because we would have a sea of various requests, collecting everything from CSS describing webpage formatting to JavaScript determining a website functionality. We are involved in collecting data to present on a webpage. So, filtering requests for those of kind 'XHR' (XMLHttpRequest) helps us to concentrate only on requesting data from a server, ignoring that focused around a webpage style. It still leaves few requests to get inspected; luckily, gambling that the required requests will have a word 'slot' narrows the search to four outstanding requests.

search-to-four-outstanding-requests.jpg search-to-four-outstanding-requests-2.jpg

Click on the request and select a 'Response' tab that discloses a JSON response produced by request and, therefore, the data provided to a webpage. From that, we could very quickly observe that a request having data we are searching for is the POST request for the URL https://groceries.asda.com/api/v3/slot/view. Just look at the 'Params' tab; we can see JSON data provided by a browser in a POST request as well; as right-click and select 'Copy All' to copy JSON data into the clipboard means that we get everything that we have to describe to Python about how to collect data.

Make a Web Request Using Python

A Requests library of Python makes that very easy to make HTTP requests programmatically. From the given inspection of a website, we know the URL we want to send the request of, the kind of requests we want to utilize (POST), and the JSON data needed to send (presently stored in the clipboard).

Practically, it gives us a code shown below:

Practically-it-gives-us-a-code-shown-below.jpg

We have pasted JSON data from the clipboard and added an easy request, posting data to the URL with json argument of a request.post() technique. Our request responding object is stored in variable r to use later.

We have replaced some parameters also in data having variables. The start_date and end_date variables clear the dynamic range to an API because we are always interested in looking two weeks ahead of the current date. The strftime technique of datetime objects helps us stipulate the precise string format needed for date-time objects, which we could match with a format we reviewed in the early JSON data copy.

The stored parameters like os.environ variables are essential information that we don't wish to get publicly available on GitHub. Afterward, we would see how we could safely store these data to be shown in the scripts.

We now get a completely working Python script that we can utilize to send requests to Asda's API and store a response object we get. Let's observe the response object and discover how we could parse that to scrape the data we're searching for.

Parse the Given JSON Responses

Our responsive object r has all data or metadata got back from the POST request with Asda's API. We first need to check if our request for the server was successful or if everything went wrong. To do that, we can examine a status_code attribute for the response objects.

Parse-the-Given-JSON-Responses.jpg

Here, we have to double-check that the URL and data are correctly formatted. If it doesn't return 200, a request has gone wrong. The complete listing of possible HTTP codes could be available here, but generally, we will get the 200 code suggesting 'OK' and the 400/404 code to make 'Bad Request' and 'Not Found' correspondingly.

Presuming that we have the 200 status code, we are ready to review the data we have in response. As it is a standard view to get data in JSON format, requests come with the in-built JSON decoder.

requests-come-with-the-in-built-JSON-decoder.jpg

Printing values of r.json to a terminal would quickly disclose that we have got big data back from a server associated with slot accessibility, pricing, capacity, etc. As we are mainly interested in slot accessibility for the project objective, we could loop through that JSON response and fill the dictionary with slots and accessibilities.

We initially loop through every slot day within two weeks that we have looked for, and within every day to study every individual slot, filling the dictionary :

filling-the-dictionary.jpg

Now as we get all data needed, and the way of programmatically extracting it when we wish, let's assume how we could set up a way to inform our end-users when the delivery slot gets available.

Twilio — Send the Text using Python

Twilio-Send-the-Text-using-Python.jpg"

Twilio is a cloud communications platform providing APIs that allow developers to send and receive text messages and phone calls in projects and apps. It opens up the entire world of possibilities for auto SMS notifications, two-factor authentication, creating chatbots, etc. Here, we will make an easy text notification system, like we get the text details of any accessible delivery slots whenever the script runs.

Though Twilio is a paid service, they provide a free trial of about £13. To start with Twilio, we have to sign up on the website (no payment data needed) and select a phone number. When it is completed, Twilio will offer us the account SID with authentication token for a project. It is more than sufficient to find us started with the project- given it costs unevenly £0.08 to send the text.

When we all are set with the Twilio account, we could start using Python API provided by Twilio. A Twilio module done for Python could get installed just using pip.

installed-just-using-pip.jpg

A Twilio API used for Python is straightforward to start with, and so many documents are accessible at https://www.twilio.com/docs. For sending a text within our newly developed phone number, we need the following:

we-need-the-following.jpg

Including this in our script for getting accessible delivery slots, we could check data for accessible slots and, if they exist, send the text to phone numbers of our selection with the notification of our preference. It is outlined in the last segment of the script:

outlined-in-the-last-segment-of-the-script.jpg

We get a complete script, allowing us to observe for accessible delivery slots with Asda and, if they are available, get a notification through text to inform us. The only enduring step in the project is to get a way of having a script running on its own as per the schedule.

Installing It to the Cloud — Heroku

Installing-It-to-the-Cloud-Heroku.jpg

Heroku is a cloud-computing platform allowing developers to deploy projects and apps to the cloud. It's beneficial to run web apps with the negligible set-up: making that perfect for individual projects. Here we would utilize Heroku as an easy way to get our script running at planned intervals.

You could sign up to start with Heroku here.

The initial step we have to take is creating a new app for housing our project:

for-housing-our-project.jpg

To get the script up and running on the cloud, we have to create a new GitHub repository with our script. You can find ours here for your reference. We also have to make a file called requirements.txt. It will have all the package dependencies needed to tell Heroku to install before it can successfully run the script.

can-successfully-run-the-script.jpg

Then, we can connect the app with the GitHub repository created for this project. Allowing 'automatic deploys' suggests that while pushing to the main branch, the project would automatically deploy with the newest updates: which is helpful if we wish to continue the project's development while it is in production.

development-while-it-is-in-production.jpg development-while-it-is-in-production.jpg development-while-it-is-in-production.jpg development-while-it-is-in-production.jpg

As mentioned earlier, several variables are in the script we wish to keep a secret. We could do that using 'Config vars' to set the Heroku app, an effortless way of storing sensitive data in the project that could easily get accessed like environment variables:

accessed-like-environment-variables.jpg accessed-like-environment-variables-2.jpg

The last step is getting our script to work automatically on the schedule. To do that, we will have to install the add-on to the app. You can install the Heroku scheduler, which helps us run jobs every 10 minutes, hours, or days.

When we install the Heroku scheduler, we can create a new job that will permit us to select our scheduled frequency and the command we would love to run. As slots go very quickly, 10 minutes is the best for scheduled jobs. The run command is easy to run the Python script:

The-run-command-is-easy-to-run-the-Python-script.jpg

Now, we can sit and relax as well as wait for text notifications!

Conclusion

We have developed many skills with this project which has opened up the world with many possibilities for new projects:

Now, we can inspect a site with DevTools, reverse engineer an API, and utilize Python's request library to scrape data: it gives us the required skills to scrape data from nearly all publicly available websites.

We have a setup using Twilio, a communications API that helps us make calls and send texts. It provides an easy method of getting or sending notifications using the reader and also opens more possibilities for Twilio: alert systems, chatbots, robo-callers, and more.

We have deployed this project using Heroku, permitting scripts to run autonomously on the schedule on the cloud. An excellent skill to get, removing local dependencies of running scripts on the PC or laptop and providing a fantastic opportunity to showcase projects online. Thanks a lot for reading this blog!

To know more, contact Actowiz Solutions! You can also reach us for all your mobile app and web scraping service requirements.

Recent Blog

View More

Fuel Pricing Trends in 2024 - Evaluation of US Convenience Stores and Gas Stations Data

Explore fuel pricing trends in 2024 with an analysis of data from US convenience stores and gas stations.

How Important Store Pricing Data And Product Availability Data are for Brands?

Store pricing data and product availability data are crucial for brands to strategize effectively, optimize sales, and meet customer demands.

Research And Report

View More

Scrape Zara Stores in Germany

Research report on scraping Zara store locations in Germany, detailing methods, challenges, and findings for data extraction.

Battle of the Giants: Flipkart's Big Billion Days vs. Amazon's Great Indian Festival

In this Research Report, we scrutinized the pricing dynamics and discount mechanisms of both e-commerce giants across essential product categories.

Case Studies

View More

Case Study - Empowering Price Integrity with Actowiz Solutions' MAP Monitoring Tools

This case study shows how Actowiz Solutions' tools facilitated proactive MAP violation prevention, safeguarding ABC Electronics' brand reputation and value.

Case Study - Revolutionizing Retail Competitiveness with Actowiz Solutions' Big Data Solutions

This case study exemplifies the power of leveraging advanced technology for strategic decision-making in the highly competitive retail sector.

Infographics

View More

Unleash the power of e-commerce data scraping

Leverage the power of e-commerce data scraping to access valuable insights for informed decisions and strategic growth. Maximize your competitive advantage by unlocking crucial information and staying ahead in the dynamic world of online commerce.

How do websites Thwart Scraping Attempts?

Websites thwart scraping content through various means such as implementing CAPTCHA challenges, IP address blocking, dynamic website rendering, and employing anti-scraping techniques within their code to detect and block automated bots.