Reviewer has chosen not to be Anonymous
Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Average
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
The paper presents an approach and experiments for ontology-based topic detection and labelling where the topic labels are constructed through use of the DBpedia hierarchy. This should lead to more human-interpretable topic labels. The approach is evaluated on two different datasets.
Reasons to accept:
Interesting approach to topic labelling
Reasons to reject:
The authors state that this paper is an extension of their 2015 ICMLA paper but the contributions of this paper were already largely covered in that paper. Tables 4, 9 and 10 and figure 8 are already present in that paper and tables 7 and 8 seem to be extended somewhat. Furthermore, the results presented in Table 10 are completely identical. It is therefore unclear to me what exactly the present paper adds to the state-of-the-art besides some additional examples, some more related work and a more elaborate explanation of the approach. I recommend that the authors are more specific about their additional contributions, and would advise them to evaluate their approach on additional datasets.
One drawback of the approach seems that a preselection of ontology classes needs to be made to base the classification on (section 7.1), how well does this generalise to other domains and how labour-intensive is this? How was this pre-selection made? Did you experiment with different selection methods?
Furthermore, it would be good to share the data and system to the research community to give them an opportunity to build upon the work.
Nanopublication comments:
Further comments:
- "In this paper we will use collapsed Gibbs sampling procedure for OntoLDA topic model" -> it would be good if the authors motivate why they chose this approach
- What was the inter annotator agreement on the quantitative evaluation?
- The DBpedia ontology at present contains 685 classes (http://wiki.dbpedia.org/services-resources/ontology) where do the 5,000,000 concepts come from that is mentioned in section 7.1?
- There are some typos and missing or misplaced articles in the text, the paper would benefit from a thorough proofread.
2 Comments
About extension guidelines
Submitted by Victor de Boer on
(This comment is based on a email-exchange, which I copy here for transparency).
After checking with the editors-in-chief, we came to the following preliminary guidelines (they will actually be made explicit on the submission page in the near future, so thanks for kickstarting this!)
Based on this, preliminary guidelines were published on the datascience journal author page:content/guidelines-authors
https://datasciencehub.net/
Meta-Review by Editor
Submitted by Tobias Kuhn on
Your manuscript was carefully reviewed by two reviewers, who both point out that although this is an interesting paper and an interesting approach, the main problem is that the current version is too similar to the previously published paper "Automatic Topic Labeling using Ontology-based Topic Models". Based on this consensus, I decided not to wait for a third review to come in, but to decide on these two reviews.
Most journals, including Data Science accept extended versions of workshop- or conference papers, provided that a) the paper clearly states the nature as an extended version and a clarification on how this is an extension of previous work and b) there is a significant amount of new content and additional elements (not just explaining the same things with more words). After discussion in the editorial board, it was decided to make these guidelines explicit on the journal's information pages. The minimum amount of new content was determined at 25%.
I therefore would like to encourage you to resubmit this paper to this journal, if you are able to address these issues, extend the paper significantly and address the other issues mentioned in the reviews.
Victor de Boer (http://orcid.org/0000-0001-9079-039X)