How to Do Topic Modelling and Cuisine Classification Using NLP?

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

Actowiz Metrics Now Live!

Unlock Smarter , Faster Analytics!

216.73.216.35

{
  "geoplugin_status":429,
  "geoplugin_message": "Blacklisted due to sending too many requests to geoplugin.net. Consider whitelisting your IP or domain",
  "geoplugin_url": "https://www.geoplugin.com/premium/"

}

http://www.geoplugin.net/php.gp?ip=216.73.216.35

Array
(
    [success] => 
    [message] => You've hit the monthly limit
)

Array
(
    [status] => success
    [country] => United States
    [countryCode] => US
    [region] => OH
    [regionName] => Ohio
    [city] => Columbus
    [zip] => 43215
    [lat] => 39.9625
    [lon] => -83.0061
    [timezone] => America/New_York
    [isp] => Amazon.com
    [org] => Anthropic, PBC
    [as] => AS16509 Amazon.com, Inc.
    [query] => 216.73.216.35
)

Start Your Project

How-to-Do-Topic-Modelling-and-Cuisine-Classification-Using-NLP-title

In this post, we aim to explore how NLP (Natural Language Processing) can be utilized to determine the culinary origin of an unfamiliar dish. We will explore two approaches: cuisine classification based on ingredients and topic modeling using meal definitions.

Firstly, we will delve into cuisine classification by examining the ingredients. We can employ NLP techniques to identify patterns and associations that align with specific world cuisines by analyzing the dish's composition. This method involves training a model on a dataset of labeled recipes from various cuisines. The model learns the distinctive ingredient combinations that characterize each cuisine, enabling it to make predictions on new, unseen dishes.

Additionally, we will explore topic modeling by analyzing meal definitions. Meal definitions provide insights into the cultural and contextual aspects of a dish. We can identify the key themes and topics associated with different cuisines by employing techniques like topic modeling. This approach involves extracting the latent topics in meal descriptions, allowing us to infer the likely culinary origin based on the identified themes.

By combining these two approaches, we can enhance the accuracy and robustness of our cuisine classification system. Using NLP in this context opens up exciting possibilities for automatically identifying the culinary heritage of dishes and expanding our knowledge and appreciation of diverse world cuisines.

What is NLP?

How-to-Do-Topic-Modelling-and-Cuisine-Classification-Using-NLP

Natural Language Processing (NLP) refers to the capability of artificial intelligence systems to comprehend, interpret, and manipulate human language as humans do. This field aims to enable machines to understand and effectively interact with human language, whether it is in the form of spoken words or written text. NLP finds applications in various domains, including developing chatbots for customer service in industries like airlines and banking, the spam filtering in email services like Google Mail, and voice-activated assistants like Siri on Apple devices.

NLP encompasses several vital components, such as speech recognition, which involves converting spoken language into the written text; natural language understanding, which focuses on comprehending the meaning and intent behind human language; and text generation, which involves the automatic generation of coherent and contextually appropriate text.

In this project, we will explore the fascinating field of NLP and delve into various aspects of it. We will examine techniques and algorithms used in speech recognition, natural language understanding, and text generation. By gaining insights into these areas, we can better appreciate the capabilities of NLP and its potential to enhance human-computer interaction and enable a wide range of applications. So, let's embark on this journey into Natural Language Processing!

Methodology

Methodology

Scraping the Website Data

For this project, we gathered essential data from two popular websites, "BBC Food" and "Epicurious." To accomplish this, we employed web scraping techniques using the BeautifulSoup library, which allowed me to extract information from the websites efficiently. As a result, we acquired a comprehensive dataset comprising more than 5,000 entries, encompassing ingredients, explanations, and cooking methods for various dishes.

Scraping-the-Website-Data

Using the collected dataset, we developed a machine-learning model tailored explicitly to the task. We utilized the data from the "Ingredients" column as the primary input for the model. Training the model on this information made it adept at recognizing and analyzing various ingredients in different dishes.

Data Processing

Before constructing the model, a data cleaning process was performed to ensure the quality and consistency of the dataset. Several steps were taken to clean the data effectively.

To begin, punctuation marks were removed from the text, and all letters were converted to lowercase. This step helps in standardizing the text and avoiding any discrepancies due to case sensitivity.

Next, numerical values indicating quantity were eliminated from the data since they are not relevant for our analysis. This ensures that the focus remains solely on the ingredients themselves.

Additionally, stopwords were removed from the text. Stopwords are commonly used words that do not contribute significant meaning to the overall context. By eliminating stopwords, we can reduce noise and focus on more meaningful words in the dataset.

By performing these data cleaning steps, we are able to create a cleaner and more streamlined dataset, which ultimately improves the accuracy and effectiveness of the machine learning model and topic modeling techniques applied to the data.

Data-Processing

To further refine the dataset and reduce the word variety, the 'WordNetLemmatizer' function was employed. This process is crucial for the model as it helps reduce the number of words, which can positively impact the model's performance.

Data-Processing-2

As part of the data preprocessing phase, an additional step was taken to remove rare words from the dataset. Some words that appeared infrequently or erroneously might have been collected during the web scraping process. To address this, the "Counter" function was imported to count the frequency of each word in the dataset.

Experimental Data Analysis

The graph illustrates the distribution of the target values, representing different world cuisines. It is evident that there is an imbalance in the dataset, where certain cuisines are more prevalent than others.

To address this issue and ensure a balanced representation of cuisines in the model, a strategy was implemented during the scraping process. Specifically, cuisines such as British and Irish, which exhibit significant similarities in terms of their culinary traditions, were grouped together as "British/Irish". Similarly, cuisines like Indian, Spanish, Pakistani, and Portuguese, which share commonalities in terms of ingredients and flavors, were combined as a single category.

By merging these similar cuisines, the dataset achieves a more balanced distribution among the target values. This is important for training the machine learning model, as it helps prevent bias towards overrepresented cuisines and ensures that all cuisines have a comparable impact on the learning process. Maintaining a balanced dataset enhances the model's ability to generalize and make accurate predictions across various cuisines.

Experimental-Data-Analysis

The word cloud visualization effectively depicts the relationship between cuisines and their corresponding ingredients in the "Ingredients" column. By examining the word cloud, it becomes evident that different cuisines have distinct ingredients, reflecting their unique culinary characteristics.

Knowing world cuisines, the generated word cloud aligns with our expectations. It highlights specific ingredients commonly associated with each cuisine, allowing us to gain insights into the key components and flavors that define different culinary traditions. This visualization method not only presents the ingredients aesthetically pleasingly but also sparks ideas about the unusual and noteworthy ingredients used in each cuisine.

Experimental-Data-Analysis-2

Cuisine Classification

During the data collection process, we created two important components: the target variable and the text column, which serves as a crucial feature for our machine learning model. However, in order for the machine to effectively understand and process the text column, we need to convert it into a numeric representation. There are several methods that can be employed for this purpose, and we will outline them before proceeding with the modeling phase.

CountVectorizer— TF-IDF (Term Frequency-Inverse Document Frequency)

To effectively use the text columns in machine learning algorithms, we need to convert them into numerical vectors. Two common approaches for text vectorization are CountVectorizer and TF-IDF.

CountVectorizer: This method creates a document matrix where every row represents the document, and every column represents one unique word in a corpus. The cells in the matrix typically represent the count of how many times a word appears in a document.

TF-IDF (Term Frequency-Inverse Document Frequency): TF-IDF considers both the frequency of words in the document and rarity of words across different documents (inverse document frequency). The resulting document matrix reflects the weighted importance of words in the documents.

To achieve our models' highest accuracy, I applied the CountVectorizer and TF-IDF tokenization methods. Additionally, I utilized n-grams, which consider sequences of words instead of single words, to capture more contextual information from text data.

CountVectorizer—-TF-IDF-(Term-Frequency-Inverse-Document-Frequency)

We experimented with several NLP models; the results are displayed in the chart on the left. The Random Forest model suffered from overfitting, as indicated by the significant difference between the training and test accuracies.

Among the models tested, the Multinomial Naive Bayes performed the best, achieving a test accuracy of 74%. This model utilized TF-IDF transformation without n-grams. Despite further optimizing the model using Grid Search CV to explore various parameter combinations, the accuracy dropped to 0.71.

Therefore, the Multinomial Naive Bayes model with TF-IDF transformation emerged as the most effective in this project, offering satisfactory accuracy.

CountVectorizer—-TF-IDF-(Term-Frequency-Inverse-Document-Frequency)-2

Topic Modelling

Topic-Modelling

Topic modeling is an effective method for grouping documents based on their content. In this project, we utilized Latent Dirichlet Allocation (LDA), a popular technique for topic modeling. By applying LDA to the "Explanations" column, we aimed to understand the different topics related to the dishes.

Following a similar preprocessing approach as mentioned earlier, we tokenized the text and extracted only the nouns and adjectives. Then, we transformed the text into vectors using CountVectorizer and examined the resulting topics.

After evaluating different topic models, we found that the model with three topics yielded the most meaningful results. Here is a brief overview of the identified topics:

Topic 1: Ingredients and Cooking Techniques - This topic focuses on discussions related to various ingredients used in cooking, as well as different cooking methods and techniques employed in preparing the dishes.

Topic 2: Cultural and Regional Influences - This topic revolves around the cultural and regional aspects of different cuisines. It includes discussions about traditional cooking styles, local ingredients, and specific dishes associated with certain regions or cultures.

Topic 3: Flavor Profiles and Seasonings - This topic explores the flavor profiles of dishes, highlighting the use of specific seasonings, spices, and flavors to enhance the taste and aroma of the prepared meals.

By analyzing the topics generated by the LDA model, we can gain insights into the different aspects and themes present in the explanations of the dishes, helping us understand the content more effectively.

Topic-Modelling-2

Topic 0 is used for Healthy Food

Topic 1 is used for Desserts

Topic 2 is used for Mexican Food

Thanks a lot for reading our post! For more details, you can contact Actowiz Solutions now! Ask us about all your mobile app scraping and web scraping service requirements.

216.73.216.35

{
  "geoplugin_status":429,
  "geoplugin_message": "Blacklisted due to sending too many requests to geoplugin.net. Consider whitelisting your IP or domain",
  "geoplugin_url": "https://www.geoplugin.com/premium/"

}

http://www.geoplugin.net/php.gp?ip=216.73.216.35

Array
(
    [success] => 
    [message] => You've hit the monthly limit
)

Array
(
    [status] => success
    [country] => United States
    [countryCode] => US
    [region] => OH
    [regionName] => Ohio
    [city] => Columbus
    [zip] => 43215
    [lat] => 39.9625
    [lon] => -83.0061
    [timezone] => America/New_York
    [isp] => Amazon.com
    [org] => Anthropic, PBC
    [as] => AS16509 Amazon.com, Inc.
    [query] => 216.73.216.35
)

Start Your Project

US

Additional Trust Elements

✨ "1000+ Projects Delivered Globally"

⭐ "Rated 4.9/5 on Google & G2"

🔒 "Your data is secure with us. NDA available."

💬 "Average Response Time: Under 12 hours"

From Raw Data to Real-Time Decisions

All in One Pipeline

Scrape → Structure → Analyze → Visualize

Explore Solutions Get a Custom Demo

Look Back Analyze historical data to discover patterns, anomalies, and shifts in customer behavior.

Find Insights Use AI to connect data points and uncover market changes. Meanwhile.

Move Forward Predict demand, price shifts, and future opportunities across geographies.

Trusted by Global Leaders – Secured by International Standards

Industry:

Coffee / Beverage / D2C

Result

2x Faster

Smarter product targeting

★★★★★

“Actowiz Solutions has been instrumental in optimizing our data scraping processes. Their services have provided us with valuable insights into our customer preferences, helping us stay ahead of the competition.”

Operations Manager, Beanly Coffee

✓ Competitive insights from multiple platforms

Industry:

Real Estate

Result

2x Faster

Real-time RERA insights for 20+ states

★★★★★

“Actowiz Solutions provided exceptional RERA Website Data Scraping Solution Service across PAN India, ensuring we received accurate and up-to-date real estate data for our analysis.”

Data Analyst, Aditya Birla Group

✓ Boosted data acquisition speed by 3×

Industry:

Organic Grocery / FMCG

Result

Improved

competitive benchmarking

★★★★★

“With Actowiz Solutions' data scraping, we’ve gained a clear edge in tracking product availability and pricing across various platforms. Their service has been a key to improving our market intelligence.”

Product Manager, 24Mantra Organic

✓ Real-time SKU-level tracking

Industry:

Quick Commerce

Result

2x Faster

Inventory Decisions

★★★★★

“Actowiz Solutions has greatly helped us monitor product availability from top three Quick Commerce brands. Their real-time data and accurate insights have streamlined our inventory management and decision-making process. Highly recommended!”

Aarav Shah, Senior Data Analyst, Mensa Brands

✓ 28% product availability accuracy

✓ Reduced OOS by 34% in 3 weeks

Industry:

Quick Commerce

Result

3x Faster

improvement in operational efficiency

★★★★★

“Actowiz Solutions' data scraping services have helped streamline our processes and improve our operational efficiency. Their expertise has provided us with actionable data to enhance our market positioning.”

Business Development Lead,Organic Tattva

✓ Weekly competitor pricing feeds

Industry:

Beverage / D2C

Result

Faster

Trend Detection

★★★★★

“The data scraping services offered by Actowiz Solutions have been crucial in refining our strategies. They have significantly improved our ability to analyze and respond to market trends quickly.”

Marketing Director, Sleepyowl Coffee

Boosted marketing responsiveness

Industry:

Quick Commerce

Result

Enhanced

stock tracking across SKUs

★★★★★

“Actowiz Solutions provided accurate Product Availability and Ranking Data Collection from 3 Quick Commerce Applications, improving our product visibility and stock management.”

Growth Analyst, TheBakersDozen.in

✓ Improved rank visibility of top products

Trusted by Industry Leaders Worldwide

Real results from real businesses using Actowiz Solutions

★★★★★

'Great value for the money. The expertise you get vs. what you pay makes this a no brainer"

Thomas Galido

Co-Founder / Head of Product at Upright Data Inc.

2 min

★★★★★

“I strongly recommend Actowiz Solutions for their outstanding web scraping services. Their team delivered impeccable results with a nice price, ensuring data on time.”

Iulen Ibanez

CEO / Datacy.es

1 min

★★★★★

“Actowiz Solutions offered exceptional support with transparency and guidance throughout. Anna and Saga made the process easy for a non-technical user like me. Great service, fair pricing highly recommended!”

Febbin Chacko

-Fin, Small Business Owner

1 min

See Actowiz in Action – Real-Time Scraping Dashboard + Success Insights

Blinkit (Delhi NCR)

In Stock
₹524

Amazon USA

Price Drop + 12 min
in 6 hrs across Lel.6

Appzon AirPdos Pro

Price
Drop −12 thr

Zepto (Mumbai)

Improved inventory
visibility & palniring

Monitor Prices, Availability & Trends -Live Across Regions

Actowiz's real-time scraping dashboard helps you monitor stock levels, delivery times, and price drops across Blinkit, Amazon: Zepto & more.

✔ Scraped Data: Price inights Top-slling SKUs

Request Demo Access icon

Our Data Drives Impact - Real Client Stories

Blinkit | India (Relail Partner)

"Actow's helped us reduce out of ststack incidents by 23% within 6 weeks"

✔ Scraped Data, SKU availability, delivery time

US Electronics Seller (Amazon - Walmart)

With hourly price monitoring, we aligned promotions with competitors, drove 17%

✔ Scraped Data, SKU availability, delivery time

Zepto Q Commerce Brand

"Actow's helped us reduce out of ststack incidents by 23% within 6 weeks"

✔ Scraped Data, SKU availability, delivery time

Actowiz Insights Hub

Actionable Blogs, Real Case Studies, and Visual Data Stories -All in One Place

All

Blog

Case Studies

Infographics

Report

July 30, 2025

Why WebMD Drug Information Scraping Is Essential for Extracting Accurate Pharmaceutical Data?

Discover why WebMD Drug Information Scraping is vital for extracting accurate pharmaceutical data, dosage details, side effects, and drug interactions.

Real-Time Getaround Availability and Pricing Tracking – A Case Study on Car Rental Optimization

Explore how Real-Time Getaround Availability and Pricing Tracking helps optimize rental car supply, improve pricing accuracy, and boost fleet utilization rates.

Raksha Bandhan & Independence Day 2025: Travel Price Surge or Discount Season?

Explore how Raksha Bandhan & Independence Day 2025 affect airfare & hotel rates using Actowiz Solutions' travel scraping tools. Data reveals price hikes or discounts.

TV Streaming Thumbnail Data Extraction - Platform-Wise Image Validation for Streaming Services

Extract TV streaming thumbnail data platform-wise. Validate image quality, consistency, and display across Netflix, Prime Video, Hulu & more.

July 30, 2025

Why WebMD Drug Information Scraping Is Essential for Extracting Accurate Pharmaceutical Data?

Discover why WebMD Drug Information Scraping is vital for extracting accurate pharmaceutical data, dosage details, side effects, and drug interactions.

July 30, 2025

Tata CLiQ Personal Care Product Data Scraping - How to Extract Actionable Insights Easily

Tata CLiQ Personal Care Product Data Scraping helps brands extract insights on pricing, reviews & trends to boost product strategies and online visibility.

July 30, 2025

Amazon Seller Competitor Review Analysis - The Secret to Outselling Your Rivals

Boost sales with Amazon Seller Competitor Review Analysis—uncover insights from rival reviews to improve product strategy and outperform competition.

Read More

Real-Time Getaround Availability and Pricing Tracking – A Case Study on Car Rental Optimization

Explore how Real-Time Getaround Availability and Pricing Tracking helps optimize rental car supply, improve pricing accuracy, and boost fleet utilization rates.

Travel Site Price Comparison – Which Platforms Had the Best Deals for Summer 2025?

Explore our Travel site price comparison case study to find which platforms offered the best hotel and flight deals during the Summer 2025 travel season.

Last-Minute Summer Vacation Deals – How Travelers Found the Cheapest International Getaways from India in July 2025

Discover how travelers scored the cheapest international getaways from India in July 2025 with last-minute deals, smart comparisons, and real-time price tracking.

Read More

Real-Time Price Monitoring & Benchmarking on Amazon & Walmart for Smarter eCommerce

Use real-time price monitoring to benchmark Amazon & Walmart prices, avoid MAP violations, and power your eCommerce intelligence with Actowiz Solutions.

Unlock Growth in India’s Booming Regional Markets with Hyperlocal Data

Discover hyperlocal insights from India’s regional markets with real-time data extraction for pricing, delivery trends, SKU tracking & brand analysis.

Outpace Competition with Real-Time Quick-Commerce Data Intelligence

Actowiz delivers Quick-Commerce Data Intelligence with real-time insights on pricing, stock, and delivery—driving growth, efficiency, and profit margins.

Read More

TV Streaming Thumbnail Data Extraction - Platform-Wise Image Validation for Streaming Services

Extract TV streaming thumbnail data platform-wise. Validate image quality, consistency, and display across Netflix, Prime Video, Hulu & more.

Scrape OLX Portugal for Real Estate Listings - Market Mapping & Lead Generation Trends Across Portugal’s Property Sector

Discover how to scrape OLX Portugal for real estate listings to analyze market trends, map regional opportunities, and generate qualified property leads.

Scraping Food Delivery Data for Smart Digital Menu Systems in India

Discover how scraping food delivery data powers Smart Digital Menu Systems in India with real-time pricing, trends, and customer preference insights.

Read More