Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Average
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)
Summary of paper in a few sentences:
This is an important study addressing a critical gap in our knowledge about slum populations. Specifically, this work has developed a dataset on women in slums, with a specific focus on the Lucknow community in India. This work is following traditional social science principles in data collection and is both value as it is timely given the number of studies on slum populations globally, with sometime little considerations of some of the disparities amount gender.
Reasons to accept:
The data is valuable.
Reasons to reject:
Not being a social scientist, I don't see where the 'data science' comes into play into this word. As someone more on the quantitative side, I would reject this manuscript given the journal's focus. However, I leave this decision up to the Editor.
Nanopublication comments:
Further comments:
Overall
• This is an important study addressing a critical gap in our knowledge about slum populations. Specifically, this work has developed a dataset on women in slums, with a specific focus on the Lucknow community in India. This work is following traditional social science principles in data collection and is both value as it is timely given the number of studies on slum populations globally, with sometime little considerations of some of the disparities amount gender. For the most my comments are quite general, but a main query towards the applicability of this work to this “Data Science” journal without much work/tools being utilised, if any at all. This, of course, is left to the Editor to come to a decision on this. There may be a more suitable journal out there in my view.
• While the study is principally about the curation of a data for use, a map of the location would help. This can be something along the lines of the location (general or absolute) of the slums within respect to Lucknow. For those, including myself, not familiar with the location, this would provide some context. This can be part of a Study Area section where additional contextual details about the area is provided.
• For several variables in the curated dataset, most have NA values. I’m wondering if these are then applicable, or should they be removed altogether.
• I’m missing a summary section following a summary of the data. The paper ends quite abruptly. Even though the focus is on producing a dataset, this is still an academic journal.
• Few additional points follow.
Introduction
• Could be bolstered with additional support from the literature.
• Units are in lakhs – while this is ok for India and similar countries around in terms of readership, it’s less likely that a lot of other countries’ readers will immediately be familiar with this measure. I suggest changing this to a more relatable unit for wider readership.
Methods
• The sentence of the first paragraph seems more fitting towards the end of the Introduction section
• This section is about the value of data really has nothing to do with methodology. In here one would expect to see survey methods etc. I think all of these points fit better within the remit of the Data section, or can be woven into the Introduction section.
Sampling procedure
• “For the selection of wards, initially, the slum colonies were put in descending order of their population size. Arithmetic mean of the whole distribution was then computed which led to its classification into two groups. The mean of each of the two groups further led to creation of four groups with lesser intervals.”
o I’m not sure how the two groups were created based in the mean. Was it one group within one standard deviation from the mean and the second group everything beyond this? A more information is needed, especially since, based on the distribution of values, the mean can be somewhat biased as a measure. Other than this, I think the sampling procedure was fair.
Data collection and survey execution
• I was wondering if there was a specific reason for the data being collected at two different time windows, that i.e., March to May, and August to October? I think some statement should be made here, whether it be due to resource limitations, requirement for support from safai navaks, etc. Or a the very least a statement should be made with respect to changes in slum population across these two time windows so users of the data can understand fitness for their specific uses.
• How were the specific priamry variables decided? One could argue that other data may have been more relevant. We’re these based on specific themes, secondary data availability, etc.?
Seocio demographic parameters
• “The median age of the respondents is 44 years. Majority of the surveyed households (56.67%) are Hindus and in terms of caste, a significant proportion of minority communities were found to be living in the slums.”
o Being unfamiliar with minorities with the caste system, I don’t what this means. Can some examples of minorities castes be provided? Or a link to additional information?
o “…followed by SC and ST communities” – What are SC and ST relating to here?
1 Comment
meta-review by editor
Submitted by Tobias Kuhn on
The two reviewers agree that the resource described in the paper is potentially valuable and described in a comprehensible way, but that in its current form, the relevance to the journal is unclear at best. To be accepted in the Data Science journal, the authors would need to make explicit how this resource can be of value to the larger data science community. The authors could do this my providing some examples or use case, preferably supported by data science literature. Additionally, the authors can add a section summarizing the dataset more clearly (following reviewer 2's suggestion) and discuss limitations (reviewer 1). This will also further provide proper data science-related contextualisation for the resource.
Victor de Boer (https://orcid.org/0000-0001-9079-039X)