Reviewer has chosen not to be AnonymousOverall Impression:
AcceptTechnical Quality of the paper:
Clear noveltyData availability:
Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix thisLength of the manuscript:
The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)
Summary of paper in a few sentences:
This paper presents a new methodology that the authors call EDAM (Expert-Driven Automatic Methodology) that supports the process of generating systematic reviews. The method is applied to the domain of software engineering in the paper, but is clearly applicable beyond this domain.
The method attempts to automatize some steps of the systematic reviews process for two reasons: to reduce the burden of manual work involved in selecting relevant primary papers, but ii) to make the selection process more objective. The paper proposes an ontology-based approach that supports the steps of i) selection of relevant papers, ii) keywording, and iii) creation of a classification schema. For this purpose, the authors build on an existing ontology learning method that has been specifically designed for learning topic hierarchies in the scientific domain. The method proposed in the paper is briefly evaluated on a case study in software engineering. Further, the accuracy of the classification approach to classify the primary literature is evaluated by comparing it to a number of annotators.
Reasons to accept:
The paper tackles an important problem, that is the problem of supporting the process of generating systematic reviews. It proposes to automatize some parts of the task, in particular the one of selecting relevant papers, filtering them down and classifying the papers into topics.
The method is based on an ontology learning approach that generates a hierarchy of relevant concepts starting from a seed term. Experts can then interact with the ontology that is displayed in terms of a tree diagram in Excel. The fact that experts can directly refine the ontology is a very positive aspect of the method.
The method is evaluated on a use case in software engineering and limitations are discussed. The methodology for evaluation is sound, empirically evaluating the classification step of the approach by comparing automatically generated annotations to those of a number of annotates that are partially experts on the domain. The paper describes some typical case studies that the methodology can support as well as limitations (Section 5.2).
Overall, the proposed methodology is novel and sound.
Reasons to reject:
No major reason to reject the paper, see below some comments to improve the paper.
There is no evaluation of the usability of the tool for interacting with the ontology. The paper mentions that the experts were able to modify the ontology, but does not say anything about how usable the Excel-based editing process was from the point of view of experts.
The paper does not explain the ontology learning approach used. It references an existing method, Klink-2, that was developed by the same authors. To make the paper self-contained it would be good to have a concise description of how this ontology learning algorithm works.
It is not clear what language can be used to specify filter criteria. It is clear from the paper that the filtering is based on matching concepts in the ontology over the set of papers. However, it is not clear if one can use operators such as NOT, AND, OR etc. It is not clear if there is a formal language in which a set of matching criteria can be defined.
The authors mention that any method can be used to map papers into concepts / topics, even a learned classifier. However, it seems that in this paper they rely on a straightforward approach that annotates a paper with a concept from the ontology if the concept or any of its subconcepts appears in the ontology. This should be made clear in the evaluation section where the annotations are compared to the set of annotations by experts.
In this same section, the authors compute agreement in terms of overlap, but ignore that agreement can also be by chance. Many measures for quantifying inter-annotator agreement such as Kappa take this into account. I wonder why the authors have not considered to use such methods to quantify agreement.
Page 3: aimed to at improving -> aimed at
Page 4: by both  and  -> bad style, do not use references as words
Page 8: a Semantic Web ontology of 58 topics*s*
Page 18: limitations based on the categorization given in  -> do not use references as words
Page 19: requires human expertize -> expertise
Page 20: performance of entity extraction and linking tooks -> tools
Two comments on the title of the paper: as the methodology is not specific for software engineering, I wonder why the authors do not choose simply the title: "Reducing the effort for systematic reviews" without the domain qualification. Instead, the authors could think about adding some qualifiers that specify which parts of the SR process their method supports and add them to the title. When I first read the title, I did not have a clear idea what aspects of an SR could be supported. This is clarified only later in the introduction, but could be made more specific in the title of the paper.
A second point: I missed the qualifier "supporting systematic reviews" in the name of the methodology. Expert-Driven Automatic Methodology does not say anything about the fact that the methodology is supposed to support SR. This is odd.