Rocío del Campo-Pedrosa1, Diego del Campo-Pedrosa1, Bettina Merlin2 and Ana González-Marcos1, 1Department of Mechanical Engineering, Universidad de La Rioja, Logroño, La Rioja, Spain, 2Fakultät International Business, Hochschule Heilbronn, Heilbronn, Germany
Traditional sensory analysis in food innovation provides limited insight into consumer behavior, whereas social platforms such as Reddit offer large-scale, real-time textual data on food-related practices and perceptions. This study evaluates Reddit as a scalable source for detecting food trends and healthy eating patterns in Spanish-language discussions using artificial intelligence (AI) and natural language processing (NLP). An end-to-end pipeline was implemented, including targeted data scraping across seven food-related domains, Spanish-language filtering (≥70% confidence), customized preprocessing, and unsupervised topic discovery via k-means clustering. The system processed 17,774 Spanish-language posts from an initial corpus of 92,949 entries. Despite linguistic challenges such as polysemy and lemmatization errors, the method produced coherent and representative themes, including barriers to home cooking, weight management concerns, economic factors, food categories, and nutrition-related consultations. These results demonstrate the effectiveness of unsupervised NLP techniques for large-scale monitoring of food-related discourse on social media.
Natural Language Processing, Unsupervised Learning, Social Media Mining, Artificial Intelligence.