Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Average
Reviewer`s confidence: High
Significance: Low significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
The paper "Are Food Ingredient Social? An Empirical Investigation" explores the application of social network analysis (SNA) techniques to ingredient networks (InN). Using two recipe datasets—INDoRI (a dataset of 5187 Indian recipes covering 18 cuisines, previously published by the authors) and a dataset curated from Yummly (from which 9 international cuisines were selected based on popularity), the authors construct networks where nodes represent ingredients and edges represent co-occurrence within recipes. The paper aims to demonstrate that these InNs exhibit properties similar to social networks, such as power-law degree distributions and community structure. The authors employ various network metrics, including degree distribution, distance, diameter, density, clustering coefficient, closeness centrality, and eigen centrality, to support their claims.
Reasons to accept:
• Strengths:
◦ Interesting Research Direction: The exploration of analogies between ingredient networks and social networks is a potentially interesting research direction.
◦ Aggregated Datasets: The use of both a specialized Indian cuisine dataset (INDoRI) and a more general international cuisine dataset from Yummly allows for cross-cultural comparisons.
◦ Nice Background Details: The "Related Work" section provides a good overview of complex networks, scale-free networks, small-world networks, and their applications in various domains. The justification for the study by stating that the scale-free nature of ingredient networks has not been previously investigated is a strong point.
Reasons to reject:
• Major Weaknesses:
◦ Lack of Interpretation and Justification: The paper presents a lot of quantitative data but fails to provide sufficient interpretation or justification.
◦ Gamma Range: The range of gamma values (1.96-2.38) is mentioned, but its significance in the context of food is not explained. How does this compare to other networks?
◦ Maximum Distance: While different maximum distances across cuisines are noted, no culinary explanations are offered.
◦ Micro Metrics: The choice of micro metrics is not justified, and their results are not interpreted in a meaningful way. What do the closeness and eigen centrality values imply about ingredient usage?
◦ Community Structure: The paper mentions the number of communities found by different algorithms but fails to provide examples of these communities or explain their culinary significance. This is a crucial flaw. Simply finding clusters does not prove a meaningful "community structure" analogous to social networks. How do these communities relate to known culinary practices or ingredient pairings? The paper does not address the crucial difference between simple co-occurrence clustering and true community structure.
◦ Table Interpretation: Both Table 3 and Table 4 are presented with minimal analysis. For instance, the discussion of Table 4 merely restates the numbers without explaining the observed differences between cuisines. Why does Southern US cuisine have so many more communities according to Leiden?
◦ Weak Justification for "Social Behaviour": The paper claims that InNs exhibit "social behaviour" but does not adequately justify this claim. Simply showing that InNs have properties like power-law distributions and community structure is not enough. Many non-social networks also exhibit these properties. The analogy to social networks feels forced and lacks a strong theoretical basis. The paper does not distinguish its findings from simple frequency-based co-occurrence or topic modelling approaches for data clustering.
Overall Recommendation:
The paper has an interesting premise, but the superficial interpretation of the results, the weak justification for the "social behaviour" claim, and the missing methodological details significantly weaken the paper. The paper presents a lot of data but fails to provide a compelling narrative or convincing evidence for its central claim. Based on this, my recommendation is “Undecided” to give the authors an opportunity to revise and strengten the paper.
The authors are encouraged to provide concrete examples of ingredient communities, discuss the culinary relevance of the findings, and fully explain each metric using these examples. This will significantly strengthen the paper and provide more compelling evidence for the central claim.
Nanopublication comments:
Further comments:
• Significance:
The paper aims to demonstrate that ingredients in recipes exhibit similar associative patterns to social behaviour observed in social networks. This is an interesting research direction that, if properly executed, could potentially contribute to the field of food computing and offer new perspectives on culinary practices. The idea of applying social network analysis to ingredient data could lead to insights into ingredient pairings, culinary traditions, and even the evolution of cuisine.
However, the current manuscript has a significant weakness in establishing the motivation and significance of this study. The authors do not adequately explain why it is important to understand whether ingredient networks exhibit "social behaviour." What are the potential applications or benefits of this knowledge? How does it advance the field of data science or, more specifically, food computing?
The paper needs to clearly articulate the potential impact of this research. Some possible areas of impact, which the authors could explore and develop, include:
◦ Recipe Recommendation: Understanding ingredient relationships could lead to more effective recipe recommendation systems that go beyond simple co-occurrence and consider more complex culinary patterns.
◦ Culinary Trend Prediction: Analyzing changes in ingredient networks over time could help predict emerging culinary trends and identify new ingredient combinations.
◦ Cross-Cultural Culinary Analysis: Comparing ingredient networks across different cuisines could reveal cultural influences and similarities or differences in culinary practices.
◦ Food Product Development: Insights from ingredient networks could be used to inform the development of new food products and flavour combinations.
◦ Nutritional Analysis: Combining ingredient network analysis with nutritional data could provide insights into the nutritional properties of different cuisines and identify potential nutritional deficiencies or imbalances.
Currently, the paper focuses on demonstrating the presence of network properties like power-law distributions and community structure but fails to connect these findings to any meaningful real- world applications or benefits. Simply showing that ingredient networks have some structural similarities to social networks is not enough to establish the significance of the work. The authors need to clearly articulate the "so what?" of their research.
Furthermore, the paper does not adequately distinguish its approach from other existing methods such as topic modeling or simple frequency-based co-occurrence analysis. It is not clear how the network-based approach provides any significant advantages over these more established methods. The authors need to address this directly and explain the unique contributions of their work.
Background:
The paper provides a reasonable overview of foundational concepts in social network analysis, including scale-free networks, small-world networks, and preferential attachment. The authors cite relevant literature on the application of SNA to various domains, such as co-authorship networks, research collaboration networks, human interaction networks, and social media platforms (e.g., Twitter and Facebook). This demonstrates a basic understanding of the core principles of SNA.
However, the paper's connection to the existing literature on ingredient network analysis is significantly weaker. While some relevant papers are cited, the authors fail to adequately contextualize their work within this specific subfield. This is a major shortcoming, as it leaves the reader wondering how this work builds upon or differs from previous research on ingredient networks.
Specifically, the paper should more thoroughly discuss and relate its findings to the following works:
• Ahn, Y. Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A. L. (2011). Flavor network and the principles of food pairing. Scientific reports, 1(1), 196.: This paper explores the relationship between flavor compounds and ingredient pairings, constructing a "flavor network" based on shared flavor molecules. The authors of the reviewed paper should discuss how their co-occurrence-based InN differs from Ahn et al.'s flavor-based network. Do these different approaches lead to similar or different conclusions about ingredient relationships? How do the network properties (e.g., community structure) compare? This is crucial for understanding the novelty of the current work.
• Teng, C. Y., Lin, Y. R., & Adamic, L. A. (2012, June). Recipe recommendation using ingredient networks. In Proceedings of the 4th annual ACM web science conference (pp. 298-307).: This paper focuses on using ingredient networks for recipe recommendation. The reviewed paper should discuss how its analysis relates to recipe recommendation. Could the findings about scale-free properties and community structure be used to improve recommendation algorithms? How does this work compare to Teng et al.'s approach?
• Shirai, S. S., Seneviratne, O., Gordon, M. E., Chen, C. H., & McGuinness, D. L. (2021). Identifying ingredient substitutions using a knowledge graph of food. Frontiers in Artificial Intelligence, 3, 621766.: This work uses a knowledge graph to identify ingredient substitutions. The reviewed paper should discuss how its approach compares to a knowledge graph-based approach. Are there complementary insights that can be gained from both methods? How does the focus on co-occurrence compare to the explicit knowledge representation in Shirai et al.'s work?
• Cheng, X., Lin, S. Y., Wang, K., Hong, Y. A., Zhao, X., Gress, D., ... & Xue, H. (2021). Healthfulness assessment of recipes shared on Pinterest: natural language processing and content analysis. Journal of Medical Internet Research, 23(4), e25757: While this paper focuses on healthfulness assessment, it also involves analysis of recipe content. The reviewed paper should briefly discuss how its network-based approach relates to content analysis methods like natural language processing, which are used by Cheng et al.
By not adequately addressing these related works on ingredient networks, the reviewed paper fails to establish its place within the existing research landscape. It's crucial for the authors to explain how their work contributes new insights beyond what has already been established in the field.
While the paper provides a reasonable background on general SNA concepts, it lacks a sufficient connection to the specific literature on ingredient network analysis. The authors need to significantly expand this section by discussing and comparing their work to existing research on ingredient networks, particularly the papers mentioned above.
Novelty:
The paper's novelty is a mixed bag. While the application of network analysis to various domains, including ingredient networks, is not new in itself, the paper presents two potential sources of novelty:
◦ INDoRI Dataset: The creation and presentation of the INDoRI dataset, a collection of 5187 Indian recipes covering 18 diverse Indian cuisines, can be considered a substantial contribution. If this dataset is indeed comprehensive and well-curated, it could be a valuable resource for future research in food computing and culinary analysis. The authors should emphasize the unique characteristics of this dataset and how it compares to existing recipe datasets. For example:
▪ What is the size and scope of other publicly available Indian recipe datasets (if any)?
▪ Does INDoRI include metadata or attributes beyond ingredients (e.g., preparation time, nutritional information, regional origin)?
▪ How was the data collection and cleaning process performed to ensure data quality?
If the INDoRI dataset offers unique advantages or fills a gap in existing resources, this would significantly strengthen the paper's novelty. It is commendable that the authors have provided a link to the dataset.
◦ Combined Analysis: The analysis performed on the aggregated dataset, which includes INDoRI and a dataset from Yummly covering international cuisines, could also be considered a novel aspect. Comparing ingredient network properties across different cuisines (Indian vs. international) could reveal interesting cultural and culinary insights. However, the current analysis is quite superficial and fails to fully exploit this potential. To enhance the novelty of this combined analysis, the authors should:
◦ Provide a more detailed comparison of the network properties across different cuisines. What are the key differences and similarities? What culinary factors might explain these differences?
◦ Explore how the network structure reflects cultural differences in ingredient usage and culinary traditions.
◦ Consider more advanced comparative analysis techniques to identify statistically significant differences between cuisines.
Weaknesses Regarding Novelty:
• Lack of New Methods: The paper does not introduce any new methods for network analysis or ingredient network construction. It relies on standard SNA metrics and community detection algorithms. This limits the methodological novelty of the work.
• Superficial Analysis: The analysis performed on the combined dataset is currently too superficial to be considered a significant contribution. Simply calculating and comparing basic network metrics is not enough. The authors need to provide a deeper analysis and interpretation of the results to extract meaningful insights.
• Missing Comparison to Existing Ingredient Network Research: As already mentioned, the paper does not adequately compare its findings to existing research on ingredient networks. This makes it difficult to assess the true novelty of the work.
Technical quality:
While the paper is generally well-written in terms of language, it lacks crucial details regarding the methods used for network analysis, making it difficult to assess the validity and rigour of the results.
I note the following methodological gaps:
• Network Analysis Methods: The paper mentions using standard network metrics (degree distribution, distance, diameter, density, clustering coefficient, closeness centrality, and eigen centrality), but provides little detail on how these metrics were calculated. Were standard libraries used (e.g., NetworkX in Python)? Were any specific parameters or settings used? This lack of detail hinders reproducibility.
• Community Detection Algorithms: The paper mentions using weighted versions of Leiden, Louvain, and WABCD algorithms, but provides insufficient information about their implementation.
◦ Were standard implementations used, or were they modified in any way?
◦ What parameters were used for each algorithm?
◦ The paper mentions that WABCD is described in [45], but this paper should be self-contained.
◦ Critically, the paper does not adequately justify the choice of these specific algorithms. Why were these algorithms chosen over other community detection methods? What are the strengths and weaknesses of each algorithm in the context of ingredient networks?
• Lack of Comparison to Other Clustering Methods: The paper does not discuss how its network-based community detection compares to other clustering methods, such as topic modeling. This is a crucial omission. Topic modeling is a common technique for analyzing text data, including recipe descriptions, and can also be used to identify groups of related ingredients. The authors could:
◦ Discuss the similarities and differences between network-based community detection and topic modeling in the context of ingredient analysis.
◦ Explain why they chose a network-based approach over topic modeling or other clustering methods.
◦ Ideally, they should perform a comparison between network-based communities and topic-based clusters to demonstrate the advantages (if any) of their approach.
• Statistical Analysis: The paper makes claims about power-law distributions but doesn't provide any statistical measures of fit (e.g., R-squared values, p-values). This makes it impossible to assess the statistical significance of these claims. Similarly, the paper doesn't provide any statistical comparisons between different cuisines.
• Interpretation of Results Even if the methods were adequately described, the interpretation of the results is weak. The paper presents numbers without providing sufficient culinary context or explanation.
Presentation:
The paper is generally well-written in terms of language, but several presentational issues detract from its clarity and overall quality.
I note the following issues:
• Contradictory statements on INDoRI Dataset: There's a clear contradiction on page 2. Lines 30-32 suggest that the paper introduces the INDoRI dataset, while lines 41-42 state that it was presented in a previous publication [45]. This needs to be clarified. If INDoRI was previously published, this paper should clearly state that it is using or extending the previously published dataset, not introducing it. This contradiction undermines the perceived novelty of the work.
• Redundant data cleaning section: Given that the INDoRI dataset and its cleaning process are deemed described in [45], the detailed explanation of the cleaning process in Section 3.3 is largely redundant. The authors should simply briefly summarize the cleaning steps and refer the reader to [45] for more details.
• Missing context for Stop Words: While Table 2 shows ingredient stop words (ISW), it would be much more helpful to provide examples of these stop words in context before cleaning. For example, instead of just listing "chopped," the authors could provide an example like "1 cup of chopped onions" and then show how it is reduced to "onions" after cleaning. This would make the purpose and effect of the ISW filtering much clearer.
• The phrase "examples of such words can be found in referenced Table 2" should be simplified to "examples of such words can be found in Table 2."
• Inconsistent Number Formatting: The inconsistent use of numerical (e.g., 9) and textual (e.g., nine) representations of numbers should be corrected. The authors should consistently use numerical representations for numbers greater than ten and follow a consistent style guide.
• Inconsistent cuisine count: The discrepancy between mentioning 9 cuisines in line 42 and 10 cuisines in line 43 on page 4 needs to be resolved. The authors should clearly state that they used 9 cuisines from Yummly in addition to the Indian cuisine from INDoRI, resulting in a total of 10 cuisines.
• Impact of using external data: The authors mentioned that they used 9 cuisines from Yummly dataset which I beleieve, overshadows the contribution of INDoRI – their main contribution supposedly. The authors need to address this directly. For instance, they might find it beneficial to:
◦ Clearly explain the rationale for using the Yummly dataset.
◦ Emphasize the unique contributions of INDoRI, even within the combined analysis.
◦ Consider performing separate analyses on INDoRI to highlight its specific characteristics.
• In page 3, line 41-45, the authors wrote –”One of them is to compilling recipes that span diverse cultural…..” The authors should consider changing “Compiling” --→ compile
• Discussion : The authors provide different evaluation metrics without adequately discussing their implications. Each metric should be explained in the context of ingredient networks, and the results should be interpreted accordingly.
• Irrelevant Introduction: The introduction's lengthy discussion of general social network concepts without a clear connection to the specific work on ingredient networks is a concern for me. The introduction should be focused on motivating the study of ingredient networks and establishing the research question. The general background on social networks should be significantly shortened and integrated more seamlessly into the context of ingredient analysis.
1 Comment
meta-review by editor
Submitted by Tobias Kuhn on
The paper decribes two recipe ingredient datasets comprising of ten worldwide cuisines the Ingredient Network (InN) constructed from them. An empirical investigation is conducted into InN and its resemblances to social network are described. There a a number of significant changes needed in order to improve the manuscript:
Brian Davis (https://orcid.org/0000-0002-5759-2655)