I would like to thank all reviewers (R1, R2, R3, R4 herein) for their comments, suggestions, and typos spotting. Please find below specific answers to reviewers' comments.
Reviewer 1: Steve Pettifer
R1: I should also confess at the start that I found this paper difficult to read and understand. Much of that is because it requires more knowledge of formalisms than I have; part of it I suspect is because of issues with the article itself.
> I have removed and/or reduced several aspects related to the theoretical explanations of the approach, and added several examples so as to make more clearer the way CISE works.
R1: My difficulty with this article — probably a misunderstanding — is that it seems to suggest that this is already achievable (although the suggestion in surrounded by, in my opinion, unnecessarily complex and slightly flakey discussion of Curry-Howard isomorphism) and that the rest of the aims can be achieved by layering on further semantic transformations.
> I have reworded the text so as to explicitly say that the approach proposed (i.e. CISE), and its implementations, have been experimented using a real set of scholarly articles, but that these experiments provide only a partial, even if positive, answer to the main research question presented in the introduction, i.e. “can the pure syntactic organisation of the various parts composing a scholarly article convey somehow its semantic and rhetorical representation, and to what extent?”.
R1: I think if this article was saying “here is a collection of ontologies and frameworks that form a kind of hierarchy that could in an ideal world be used to describe the perfect semantic publication; and maybe you can even reason about the relationships between the layers” then I’d be a lot more comfortable with it. As it stands, either I’m not following the formal glue that’s used to hold the different steps together, or perhaps the formal glue isn’t quite as cohesive as it might be. Or some combination of these two things.
> I've explicated better these aspects in the introduction as well as in section 4, section 5 and section 6 of this revision.
R1: I’d suggest simplifying the formalisms (I really don’t think they add clarity), and including concrete examples. I think then it could make an excellent position paper that unifies several already significant pieces of work.
> The formalisms have been largely reduced, and the article has been largely revised so as to introduce several examples for explaining all the theoretical foundations and the steps of CISE and its implementations.
R1: "Automating the Semantic Publishing" -> should this be “Automating Semantic Publishing” ? I’m not sure what the definite article is doing here.
I think the sub title is too dense for a general CS audience (or even for a ‘data science’ audience). It may make sense to someone that’s into formal methods. Maybe.
> I've removed the subtitle.
R1: Throughout the article needs a fair amount of attention to the written English; too many minor infelicities to list here. Because the article is technically dense, these risk tripping the reader rather too often.
> I've revised the whole article and corrected all the typos and grammatical mistakes I've found.
Reviewer 2: Karin Verspoor
R2: There is relevant literature that is not considered here
> I've added an entire new section about related works, by citing all the papers and specifications proposed, among others.
R2: Note that the citation provided for categorial grammar is probably not the canonical/most relevant citation, see  inter alia.
> The citation about categorial grammar has been removed as consequence of the revision of the article.
R2: In addition, I found some of the aspects of the theory and algorithms described in section 4.1 not entirely clear. The interpretation of some of the core structural patterns as named isn't intuitive, and the notion of "coherence" that is relevant to pattern assignment is not entirely clear, especially given that every possible combination of t,s,T exists.
> I've clarified all these aspects in the revision, by adding several definitions and examples.
R2: I also wonder about the use of the term "validity" in the Conclusions. In what way could this approach (as a whole) be validated as opposed to shown to apply in particular contexts?
> I've reworded the part about the validity in the conclusions, removing the term “validity” – that was incorrect for the context, and clarifying that CISE (and its implementations) provides a partial positive answer to the research question presented in the introduction Section 1.
R2: Note there are a few typos and language usage issues.
> I've revised the whole article and corrected all the typos and grammatical mistakes I've found.
R2: I would suggest "the Semantic Publishing" should be simply "Semantic Publishing" in the title and elsewhere
> I've revised the text accordingly.
Reviewer 3: Tobias Kuhn
R3: The theoretical links are not convincing
> I've revised entirely the theoretical part, that now is introduced in Section 3. In particular, I've focussed the argument on the principle of compositionality and the downward causation, and I've added examples of applications of both the theories. The Curry-Howard Isomorphism and the Montague grammars are now quickly presented as exemplar applications of the aforementioned theories, without providing any additional detail that can distract the reader from the point of the article.
R3: Lack of discussion on limitations, possible downsides, and assumptions made
> I've added a new section (i.e. Section 6) for discussing the limitations and assumptions of the approach.
R3: I think Section 4 is the most interesting part, but unfortunately only the lowest layers are described in detail (which, in my view, are the least interesting for the main point the paper is making). I suggest to put less focus on the theoretical links and more on the practical experiences and preliminary findings at the higher levels.
> I've added several examples and explanations about all the aspects that have been implemented and tested. However, these implementations refer to the first four layers, that mainly concern the structural characterisations of scholarly articles. In the current shape, such implementations show precisely how the idea depicted by CISE works. In addition, I've added more speculation about future developments of the research in Section 6.
R3: I don't think, however, that the principle of compositionality can really carry all the argumentative weight that is put onto it here. After all, there is no final proof that this principle really holds for natural languages in their entirety. There are in fact many known cases where compositionality doesn't hold, such as for idiomatic expressions or sarcasm. Furthermore, even if we assert the principle of compositionality as a fact, it would only tell us that we could in principle semantically parse papers in an automated fashion, but it would not allow us to conclude that this is feasible in any realistic setting. In fact, over and over again, all kinds of ambitious natural language processing has been proven to be very difficult and often infeasible with current technology.
> It is true that the principle of compositionality alone is not enough for the goal I want to reach with CISE, and I've clarified this position in Section 3. As a consequence, the part about the principle of compositionality has been extended and accompanied by another crucial theory,the downward causation (i.e. the fact that higher layers can cause the specification of new meanings to a lower layer), which was only implicitly cited before. In addition, I've also clarified that, even if the principle of compositionality failed or, in the best case scenario, has shown several drawbacks when considering natural language processing, in this work I explicitly avoid to use NLP technologies for annotating the various article parts. Rather, my approach is entirely based on pure Document Engineering techniques, that consider only the containment relations between the article parts – without caring about the natural language used for writing the articles themselves. Thus, the critiques moved in the past on the use of the principle of compositionality and NLP do not apply in this context.
R3: At what accuracy do you think we can perform such a full parse of scientific papers? Automated approaches are never perfect (often with accuracy levels below 70% for non-trivial NLP tasks), and this seems to heavily affect the arguments made in the paper.
> Honestly, it is difficult to know precisely the answer to that question, since it would be difficult to consider the overall annotations added according to all the layers proposed. However, I've added some information about the precision and recall (~0.85) of the outcomes of the preliminary experiments we have run by using the implementations of CISE introduced in this article.
R3: When do you think we will be able to perform complete semantic analyses of scientific papers? In 5, 10, 50 years from now? What should we be doing until then?
> It is difficult to say, since I have not experimented the use of CISE with the higher layers of annotations yet, and the corpus of documents we have used in the experiments were referring only to a few of communities. Optimistically, the aim would be to implement a good mechanism for retrieving such multi-layer annotations by means of CISE by 5-10 years.
R3: Sometimes authors write in ambiguous sentences (also for human readers), and deliberately or accidentally leave out important information. With your approach, we are stuck with incomplete information in these cases, whereas involving authors in the process could solve this. This shortcoming of the approach is not discussed.
> Actually, CISE does not consider the processing of the natural language text at all by design. Thus, it would not be able to identify the absence of these important information, as well as other aspects related to the pure NLP analysis of the textual content – e.g. the recognition of named entities. I've clarified this point in Section 6, when I speak about the limitations of the approach.
R3: The "iterative" part of the "compositional and iterative semantic enhancement" is not really explained. Is "iterative" referring to applying one layer after the other? To me, this wouldn't be an intuitive use of the term "iterative". I think "iterative" would imply to go through all the layers (or the individual layers) several times.
> I've added several pseudo-codes describing CISE and its implementations. This should clarify why CISE is iterative – it goes through a the set rules several times, until no new annotations are added.
R3: I think "the" in the title should be omitted: "Automating Semantic Publishing" instead of "Automating the Semantic Publishing" (and same for the first sentence of the abstract)
R3: In general, I suggest to have a native speaker check the document with respect to grammar and style. At several places, I think that some of the used grammar constructs are awkward if not incorrect, but not being a native speaker either, I don't feel confident in my own judgment in what might be borderline cases or simply a matter of taste.
> I've revised the whole article and corrected all the typos and grammatical mistakes I've found. I've spent a lot of effort for revising extensively the text, and honestly I had not enough time for asking a native speaker to revise the text at this stage. I'll care to go through a native speaker revision if the paper will be accepted for publication.
R3: The first paragraph of Section 1 contains many links but no citations (except for the last sentence) that would provide evidence for claims like "... have resulted in ... acceleration of the publishing workflow".
> From my perspective, links are informal kinds of citations, even if it would be not appropriate for them (e.g. Wikipedia articles) to be included in the classic reference section with other scholarly works. However, I've added appropriate statements using the property cito:linksTo so as to formally defining them as citations by means of the Citation Typing Ontology (CiTO).
R3: "... which is very close to the recent proposal of the FAIR principles for scholarly data": Very close in what sense? What are the differences?
> The sentence has been reworded clarifying that FAIR implicitly adopted such Semantic Publishing assumptions.
R3: "generally only a very low number of semantic statements (if none at all) is specified by the authors": Can you be more specific? What are the average/median/maximum values?
> The number of statements in a single paper presented during SAVE-SD workshops was found to range from 24 to 903, yielding a median value of 46 (25th percentile 34, 75th percentile 175). I've added this clarification in the article.
R3: With respect to the paragraphs connecting to Genuine Semantic Publishing, I am not sure whether an average reader is given enough background to understand this discussion. Maybe the issue of "should we or shouldn't we require authors to make a significant extra effort?" could be stated more clearly and more explicitly.
> I've revised the introduction in order to stress more on the fact that, while the authors should make that extra-effort, (s)he must be appropriately supported by at least semi-automatic mechanisms.
R3: "The idea is that the aforementioned approaches can work correctly only if used with documents stored in a particular format ...": Do these *approaches* really only work with a particular format, or is it just the current *implementations* of these approaches? I think this is an important difference.
> I've reworded the sentence in order to make the point clearer. I'm not talking about approaches in that context anymore, but rather I talk about tools, which is more precise.
R3: Contrary to "... if the text to process is written in a particular language such as English, as happens for FRED ", I read on the linked website that "FRED is [...] able to parse natural language text in 48 different languages". This should be clarified..
> I've added a footnote that clarify this passage. In particular, even if the official website of FRED claims it is able to “parse natural language text in 48 different languages and transform it to linked data”, from empirical tests it seems it has been trained appropriately only for English text. In fact, the other languages are not addressed directly, but rather they are handled by translating non-English text into English (via the Microsoft Translation API) and then by applying the usual FRED workflow for transforming the English translation into Linked Data.
R3: "It is worth mentioning that this approach is not merely theoretical, but rather it has been implemented ...": An important qualification here is that is has been *partially* implemented. None of these grammar correctly represent an entire natural language.
> I've revised entirely the text in order to explain the whole point better. In particular, CISE has been now presented as a proper algorithm by means of a Python-like pseudo-code, and I've clarified that colleagues and I have provided some implementations of the approach in order to inferring automatically annotations belonging to the first four layers. Then, I've repeated in the conclusions that CISE, and its current implementations, provide only a partial, even if positive, answer to the research question presented in the introduction.
R3: I didn't understand why "hierarchical markup" is needed as an assumption in Section 3. If you assume that natural language sentences can be automatically parsed at great accuracy (as you seem to be assuming), then certainly you can automatically detect the hierarchical structure of documents as well.
> Well, basically it strictly depends on the input format of the document in consideration. The point is that CISE needs appropriate hierarchical marked-up documents for working correctly, and this is feasible to obtain if we consider XML-like markup languages and even LaTeX, but it is surely more difficult if we try to parse, for instance, PDF documents. In Section 6 I've tried to clarify why the need of having appropriate “hierarchical markup” is so important.
R3: "there is no need of having a prior knowledge about the particular natural language used for writing the scholarly article": I don't understand what you mean by "prior knowledge" here. Somebody or something would need some knowledge (in fact deep knowledge) about the language to semantically parse the text at all the layers.
> I've revised the text in order to clarify that CISE does not consider the actual textual content of the article but only the containment relations between its part. This justify the fact that no knowledge about the language used for writing the document content is necessary for using the approach.
R3: Figure 1: I think I understand the meaning of the colors in this figure, but I failed to understand the meaning of the x and y axes. This should be explained better.
> I've entirely revised the figure in order to make it clearer. In addition, it has been also accompanied by additional examples.
R3: Section 4: I would have liked to learn a bit more about the ontologies, tools, and existing studies on the layers 4 to 8.
> I've extended a bit the discussion introducing a particular aspect related to layer 4 in Section 5. However, currently I don't have performed strong studies about the layers 5, 6, 7, and 8 so far, and, thus, it would be difficult to provide certain approaches or details about how to use CISE for inferring the annotations related to those layers. However, I've extended a bit the discussion about them in Section 3 and in Section 6.
R3: Section 4: I would expect some of the most difficult but also most interesting kind of knowledge to extract from a paper to be domain knowledge, i.e. what the authors have found out about the world (e.g. about living organisms in the case of biology). I don't see this aspect anywhere in the 8 presented layers. This seems to be another limitation that is not discussed.
> Several existing applications, such as Named Entity Recognition tools (NER), can be used for addressing successfully this aspect. However, the use of the aforementioned tools and other Natural Language Processing (NLP) technologies are out of the scope of CISE. I've clarified this point in the revision.
Reviewer 4: anonymous (review submited after decision)
R4: it is not clear whether this is a position statement wrt semantic publications, or if this is a paper in which the author is presenting a framework. If the former, then the author fails to present a coherent position. If the latter, then the author fails in presenting a framework.
> I've revised extensively the paper in order to clarify the scope of the paper precisely. The particular position I try to defend is that it would be possible to use pure syntactic organisation of the various parts composing a scholarly article for inferring its semantic and rhetorical representation. This aspect has been now presented in the introduction in form of a research question. The (partial) positive answer to this question is basically provided by showing how CISE has been implemented for inferring annotations related to the first four layers (mainly related to the structural characterisation of articles) presented in Section 3.
R4: Beyond these issues, there is the problem of readability. The paper is not clearly written, the English is not acceptable of publication standards, the paper is not well organized, there are typos, problems with punctuation, lack of examples, etc. It is not easy to understand the real scope of this paper, I am guessing the author is really trying to get a first impression about an idea; however, the idea is poorly presented. The problem is also not well discussed and again the paper is very disorganized –this is probably because the author is not clear as to the intention of the paper. The author should also consider the whole publication lifecycle; at the very least the author should somehow consider the publication workflow. The paper lacks scope, it is difficult to read, the English doesn’t make it easy for the reader to understand the paper.
> I apologise for the previous organisation of the text. However, even if they have not been organised appropriately in the first submission of this article, the ideas introduced in this article are the results of several studies I was involved in the past six years, studies that have been tested appropriately on real case scenarios. I've revised the whole article and corrected all the typos and grammatical mistakes I've found. As already said to Reviewer 3, I've spent a lot of effort for revising extensively the text and its organisation, including the addition of several examples and explanation for making the whole discussion more robust. Honestly I had not enough time for asking a native speaker to revise the text at this stage. I'll care to go through a native speaker revision if accepted for publication though.
R4: The author makes quite a few unsubstantiated claims that should be better supported; there is quite a lot of literature that is not referenced in this paper and that is quite relevant. There are also commercial applications that should be analyzed against the ideas presented by the author, e.g. nature science graph, and others that I give in my review.
> A new section (i.e. Section 2) has been added so as to discuss related works that have been presented in the past. However, I preferred to focus on approaches and implementations that aimed at inferring annotations from article sources instead of talking about the existing commercial and non-commercial services and resources which publishes datasets of articles metadata, such as Springer Nature SciGraph or OpenCitations.
R4: The author touches on semantic publications but does not contextualize the work within any publication workflow. How are we getting there? Is this an approach that will work only for new publications that are born within the idea/framework/context that the author is presenting?
> I've revised a bit the introduction and, in particular, the section about the theoretical foundations so as to clarify better this aspect. Honestly, I don't think CISE is applicable only to new publications, since one of the main requisites it has is to have the article described with appropriate hierarchical markup, something that is available since years – e.g. see PubMed Central with JATS, or ScienceDirect with the Elsevier Document Schema.
R4: The author does not seem to be well aware of current publication platforms that are doing many of the things that he is describing; to name but a few: https://science.ai/overview, the work at elife labs with lens and R markdown, semantics for Jupiter notebooks, nature science graph, Cochrane linked data, ZPID linked data. There is Biotea, this is somehow doing just what the author is describing. Also, close to the Biotea experience there is “Semantator: Semantic annotator for converting biomedical text to linked data”. There are more examples of work seeking to add semantics to publications. Some address the problem for existing publications; some other authors are working on solutions for novel publications. Either way, all of that is relevant for the work the author is trying to do.
> I've added a section about related works where I introduced some relevant research works in the area. While all the works cited by the reviewer are important, the main part of them does not explicitly tackle the issue introduced in this paper, i.e. if the pure syntactic organisation of the various parts composing a scholarly article can convey somehow its semantic and rhetorical representation. Thus, instead of focussing on interfaces for allowing a human to annotate article parts, I've preferred to focus on some of the existing automatic mechanisms (based on NLP, OCR, Machine Learning tools) that have been proposed in the past for characterising article components.
R4: There is annotation and NLP written all over the work presented in this paper; something the author is not clear about. For instance, automatic annotation is not perfect, far from it; also, there is a lot ambiguity in domain ontologies –ambiguity that is inherited by the annotations. Has the author considered any of these in his approach? Moreover, if the author is going for reasoning over several annotations from various ontologies then I would like to see a really convincing case. SNOMED and MEDRA, ChEBI and PubCHEM illustrate how this is difficult and may lead to contradictions. A running example could do a lot for this paper. In addition, it is not clear if the author is talking about annotation as NER or annotation using NLP pipelines; in any case, neither one of them gives 100%, so one has to also consider human annotation. If human annotation is involved then what could the task look like? What quality parameters (e.g. inter-annotation agreement) should be considered, will the annotations be part of the semantic layer of the paper or will this be an additional payer somehow attached/related to the paper?
> I've added a lot of examples in all the sections of the article so as to improve the readability and the understandability of the approach. I've also clarified in several points (e.g. in the introduction and in Section 6) that this article and, thus, CISE is not about using NLP or NER tools for inferring annotations to the article parts. Rather, the approach presented uses only the containment relationships that exist between document parts as starting point for inferring multi-layer annotations.
R4: > What is the incentive for the author? Shouldn’t this be the replacement of the typesetting in the publication workflow? Is there a benchmark for tools that can automate the process? Adding the semantics, what advantages? What incentives? When in the publication workflow? Whose work is this? From my experience running a workshop addressing issues in semantics in scientific publications I could see how these are issues that need clarity for everyone. These issues are also related to author's available time, nowbody invests time and effort without first knowing why, what for, and how.
> In the case of the SAVE-SD workshops, the incentive provided to authors for including RDF statements in their HTML submissions was the prize (in euros) for the best HTML submission – that was selected also according to the number of RDF statements there were specified. No other kinds of explicit incentives have been provided by the organisation.
R4: > sure, this has been addressed in the past by many authors but… it has also been said that automatic annotation has quite a few problems. The accuracy of the annotation and also the quality of ontologies and also the fact that these annotations will come from several ontologies and this leads to the problem of reasoning over multiple overlapping ontologies that mostly likely will bear contradictions. If the author is talking about human annotation then again, there are quite a few issues to address and investigate before doing it. Mark2cure has had relative success but they work in an overly scoped domain with an overly scoped annotation task. Also, how are u planning to define the annotation workflows involving humans and software?
> While it is an interesting discussion, this article is not about human annotations, nor about proposing a workflow for enabling human+software annotations of papers. The whole work is focussed in trying to provide some insights for answering the research question presented (in the revision) in the introduction.
R4: > This is really interesting but also poorly charted territory in which lots of authors have not really succeeded in the past. Is the author advocating human annotation for the identification of function of the citation or is this some sort of sentiment analysis kind of automated task?
> I've reworded this point in the revision, since it wasn't clear that I was talking about the automatic identification of citation functions.
R4: > Interesting, I have checked the SPAR ontologies and as ontologies they model some of this. However, I could not find the SPAR extractor suite.
> The SPAR extractor suite is included in the RASH Framework, available at https://github.com/essepuntato/rash. I've added a link to the GitHub repository in the revision.
R4: Fred is limited; it is yet unclear how could FRED be applied to a wider context than that described by the authors.
> As far as I know, FRED has not been developed for processing scholarly article only. It is a flexible tool that can be used, in principle, on any text in order to convert it in Linked Open Data.
R4: > Do u mean different languages as in English, Italian, Spanish? Or do u mean different narrative structures? If the latter then, what are the narrative structures most commonly used in scientific literature? This has already been studied before.
> I've revised that passage so as to clearly referring to languages such as English, Italian, Spanish, etc.
R4: > Once again, the author needs examples everywhere.
> I've reworded the passage and clarified everything by using several examples.
R4: > The whole paragraph is really complicated. As just opinions it is fine. For a position statement paper I would expect to see this very well supported. As a paper in which the author is presenting a framework, this needs a lot of work.
> I've clarified this point is Section 6, where I talk about limitations of the approach.
R4: > Ok. Lets start by saying that strictly speaking these are not hypothesis. Consider writing these a problem statements, research questions or outright hypothesis.
> In the revision, I'm now considering them as conditions for the applicability of CISE.
R4: The body>section>paragraph I don’t understand. Everything else in these points is really arguable and needs better backup in the form of references that really support these as assertions. For instance, “there is not need of having a prior…..” perhaps an example could come in handy. The author should focus on a particular domain, select a well-defined corpus of documents and elaborate from there on.
> I have reworded the whole passage in order to clarify better these points.
R4: > This is, IMHO, the most interesting paragraph of the paper. However, the lack of an example just diminishes its importance. Also, the way it is written makes it seem like a lot of opinions. Once again, if this is a position statement paper I would consider this to be part of the overall position. But, even as a position statement the author needs to give the reader more than just his word for unsubstantiated claims.
> Several examples have been added so as to support the whole discussion.
R4: > This is fine, but… ontologies are just models and data talks lauder than words. So, in order to convince me I need to see data, not just models; along with data I would like to see tested data. If just ontologies then at the very least I would like to see instantiated ontologies so that data tells me how you are right. Once again… I need a running example in this paper, I would like to author to focus on a clear message, I would like this paper to be much better articulated. The large number of self-citations are not helping the author in making a clear case –his own previously published papers may arise further evaluation under the light of his claims in this paper. Also, there are lots of papers that should be cited here and that are missing. His section “from structural patterns to structural semantics” indicates me that it is quite simply easier to map JATS/XML elements to ontologies (bearing in mind minimal ontological commitment) rather than embracing many of the things the author seems to embrace. The text that usually follows the e.g. is not enough explanatory; again a running example could help the author to make his case.
> I've added several examples about how these ontologies have been used to annotate article parts by means of CISE. In particular, we have clarified that CISE does not considering a particular markup language like JATS, but it is flexible and can be used with any markup language (which allows to organise article parts hierarchically) without any prior knowledge about the schema of the language and its vocabulary.
R4: > Yes, like for example?
> I've clarified better this point in Section 6 and in the conclusions.
R4: The tittle reads funny, the author should consider making it closer to the content of the paper. Also, the analogy ised in this paper needs some work. I don’t see how it is relevant.
> I've modified the paper accordingly.