We would like to thank the three reviewers for their very valuable and thorough reviews. Our revised version addresses their comments and suggestions, and we provide point-by-point responses below. > Review #1 > I would like to see more examples of how the criteria of genuine semantic > publishing would apply in practical cases, such as a research paper presenting > a scientific experiment and its results. What would qualify as a light-weight > and fine-grained representation in this case? Would a natural language > description of the results and the claims be enough? I think that some more > examples would be very useful as guidelines to both authors/publishers that > want to follow the paradigm of genuine semantic publishing and to developer > that intend to implement relevant tools. We now added such a discussion with examples to the end of Section 4. > I believe that the categories in Table 1 should be explained better. For > example, I don’t understand why scholarly HTML could not represent "program > code" or "domain data". It is possible to include both as RDFa. We clarified this, but we are not aware of any approach to represent executable code (such as a Python script) and its input and output data (such as tables with thousands of entries or more) in RDFa in a *reasonable* way. > "Semantic representations can only be considered authentic if they originate > from an agent that is authoritative in a given situation. In the case of > publication of scientific results, the only authoritative source are the > researchers..." Here it would be interesting to mention/discuss data > provenance and how it relates to your claim. We added a short passage mentioning provenance representations, but we do not want to go too much into technicalities at that point of the paper. > "More so than narrative texts, semantic representations can be broken down > into independent pieces that can be interpreted independently." Context can be > of utmost importance in science. Don’t you think that a statement taken by > itself would risk being misinterpreted? Let’s take the case of "A causes B". > Does it mean that a study found a statistical correlation? Is it significant? > And how strong is the correlation? Is the correlation subject to some other > condition? Yes, context is very important, but context can and should be formally represented as well. If you write an RDF triple like "s:A p:causes o:B" and if that triple is true in its context, then it must also be true when interpreted independently. The (minimal) semantics of RDF enforce this. Otherwise RDF semantics are violated, and the statement should have been modeled differently. We added some discussion on this. > Minor corrections: > non-intuitive way, Instead > non-intuitive way. Instead > claiming: The main message > claiming: the main message Done. > Review #2 > Related work was primarily on semantic publishing within scholarly > communication, however the semantics that it wanted to improve on was the > general "semantic publishing". If the original "semantic publishing" needed to > be clarified further or was deemed to be inappropriately used in practice, I > would have expected the related work to focus more on the wider use/discussion > around "semantic publishing" as opposed to the scholarly communication > context. We further clarified our focus on scholarly communication in the first sentence of the abstract and in the introduction. > The proposed criteria tends to focus on "static" published or publishable > information. It can be used to check the "quality" of works, however, it > doesn't provide sufficient guidance (or criteria) on how to account for things > like interactivity - this may be classified under "essential coverage" but it > wasn't clear to me (perhaps I've missed it). [...] It might be useful to > further describe what dimensions are included (i.e., the current criteria), > and which are intentionally excluded. We added a paragraph to Section 4 ("Before we move on to demonstrate ...") to explain why interactivity is not part of our criteria. > The screenshot of the index page and the accompanying helps to understand the > example, however, I think it may be more useful to show an abstraction. Is the > "index" of available representations/essential coverage an important or > required unit to have? We considered showing an abstraction instead of Figure 2, but we think our points are easier to understand and more intuitive with a concrete screenshot. And no, the landing page and it's "index" are not required components. We clarified this in the text. > Review #3 > The paper does not distinguish between settled science and the forefront of > science. The importance, for the forefront, of representing a paper's > arguments (as opposed to its claims) does not seem to be taken seriously. > Statements without justification seem to me to be only useful for settled > science, not for work that might be contested or counterargued. We discuss this now in the end of Section 4. > The intended use and application of the semantics for the TriG file could be > made more clear: is it to refer back to (in a future semantic publication)? To > index? These TriG files are nanopublications that encapsulate the Turtle content and have additional metadata and provenance. This is now clarified in the text. > to my mind, arguments (more detailed than the assertions and sub-assertions > from the narrative in the sample TriG) will be needed to serve many aims of > publishing at the forefront of science. We agree and we added a paragraph to Section 5 ("We would like to note here that ...") putting our own representation into perspective by stating that our own representation achieves the criteria at a basic level, but much more can and should be done in the future. > While the authors are well-versed in semantic publishing, they miss some > current trends. SEPIO is a stand-out: Brush, Matthew H., Kent Shefchek, and > Melissa Haendel. "SEPIO: A Semantic Model for the Integration and Analysis of > Scientific Evidence." ICBO/BioCreative. 2016. Thanks for reporting this omission. We include it now in the end of the Related Work section. > Dokeli generates RDFa under the hood; problems with this RDFa should be > directly addressed We clarify and emphasize now that the scholarly HTML solutions like Dokieli have a slightly different focus but can play an important role in genuine semantic publishing. The RDFa that is generated automatically under the hood by Dokieli represents, apart from the metadata, just the narrative text paragraphs in RDF literals. > RASH and Dokeili should be considered for Table 1 (maybe they don't fit but > it's not immediately obvious to me either way). They are both specific implementations of Scholarly HTML. We clarified this in the text. > Stronger arguments about WHY this vision has the potential to have a positive > impact are needed. What is the longer-term intent and implication of this > work? Does it have any practicality and practical impact? Or does it, at the > least, drive a research agenda that will lead towards better scientific > publishing or better scientific knowledge management, in practice, in the > semantic web community or at large? These points are not really addressed but > they seem (to this reviewer at least) an essential part of a real vision in > this area. We added a longer paragraph on this at the end of Section 4. > Even taking the paper itself, and admitting its arguments, I think this work > could do a much better job of forefronting the key ideas of the proposal. I > think that those ideas may go beyond the 5 criteria. For instance this > statement is vivid, intriguing, and possibly groundbreaking, but it is not > supported by the text: "We will argue below that narrative text necessarily > remains an important part of scientific discourse and communication, but it > also has to be possible to publish data that is self-explanatory due to its > formal semantics without the need for a narrative." To my mind, if this is a > point you want to make in THIS paper you should make it strongly. This is a very good point. We now elaborate more on this in several paragraphs in Section 4. > Overall I think that the paper could devote more effort to persuasion and > carrying the reader along. By addressing the other comments and making our arguments clearer and more explicit, we hope that the paper is now more persuasive and better in carrying the reader along. > While the first line discusses scientific publishing, "semantic publishing" > could be misunderstood to focus on non-scholarly content as well. Consider > modifying the title to be more clear. We mention this now in the first sentence of the abstract and further clarify it in the introduction. > I took a look at the Berners-Lee/Hendler article from 2001. (Consider adding > DOIs; for that one, for instance, it's 10.1038/35074206 ). We have DOIs/URLs for all references (see HTML versions), but these don't always show up with the given BibTeX reference style. > [...] That part of the Berners-Lee/Hendler vision is not fully achieved, true, > but this kind of work is really going on. You cite some of it (e.g isn't this > what your reference #18 does?) Yes, such approaches that cover many (maybe all) of our criteria do exist, but they do not seem to propose or follow a general vision of what we should be aiming for in the longer run. This is the gap that our paper is trying to fill. > For instance, one very successful recent example comes from the adoption of > RRIDs: ... Thanks for this pointer. This is indeed a relevant effort, and we now cite this paper in the Related Work section. We see this as an important first practical step, but not more than that. RRIDs allow us to unambiguously refer to biomedical resources, but not to make any statements about them (such as expressing the relations that hold between them). > Shotton's 2009 paper is used as a strawman; You can call it a strawman, but it is a strawman that accurately reflects the majority of works that claimed the term "semantic publishing". We updated the text to reflect this better. > However you have not really established why your definition is the best > alternative. In particular, your "genuine semantic publishing" will break with > all current publishing: few previous publications will have been enriched by > their authors. Yes, it will break with current publishing. We argue that we need a bold vision for the future, and such a vision, by definition, breaks with the status quo. We cannot prove that our proposal is the best possible, but we hope that the added paragraphs make it more convincing. > In my view, we should aim for BOTH machine and human-readable text; Yes, we agree 100%. We demonstrate in Section 5 ("Genuine Semantic Publishing in Action") how machine and human-readable representations co-exist with our approach. > while content-negotiation is a good thing, managing multiple versions for > different types of consumers means they could get out of sync (and they have > no expectation of carrying the same content). This is indeed a difficulty during the authoring stage of a paper, but not an insurmountable one. Already now, authors have to ensure that the abstract and conclusion of their paper, for instance, remain in sync. In the same way, they should also take care that their formal statements remain in sync with their narrative text. > Shotton is not the only visionary of semantic publishing; among others, you > could look to Steve Pettifer's ~8 papers (on Utopia documents, OpenPHACTS, and > about semantic publishing in general). (About half of these are cited in > Wikipedia currently, for easy reference, see the bibliography of > https://en.wikipedia.org/wiki/Semantic_publishing ) We chose Shotton as our main entry point for our argument because he explicitly focuses on the term "semantic publishing" and presents it (sometimes) as a vision for the future. Again, this is not a criticism, as it is an accurate reflection of the recent body of literature around "semantic publishing". Steve Pettifer, in contrast, uses the term "semantic publishing" only rarely and mostly when referring to related work (two out of the four papers cited on Wikipedia you refer to do not mention "semantic publishing" in the text at all, and another one only mentions it once with respect to related work). Steve Pettifer also does not propose an alternative concept or long-term vision for "semantic publishing", which is of course again no point for criticism, but explains why Shotton is a better basis for our discussion and also underscores the importance of our goal to fill the gap and actually propose such a vision. > PDFs are not necessarily incompatible with RDF; Adobe's XMP metadata can be > embedded in documents or pushed into metadata 'sidecars'. Crossref folks even > experimented with XMP and they once released an open source tool for pushing > metadata into PDFs given a DOI: We don't claim PDF to be incompatible with RDF, only that such formal semantics do not "come naturally" with that format, which we think is a fair and uncontroversial statement to make. To further clarify this, we added a note and a reference to the PDF-based approach of Utopia documents. > I do not find Figure 1 compelling; your mileage may vary with other readers > and reviewers. I disagree that "By only looking at the formal semantics, one > can possibly find out the topic of the paper but not what the paper is > actually claiming: The main message is missed." This is not inherent in > post-hoc annotation, nor in annotation by non-authors. I would point, for > instance, to entire industries with paid (often PhD-level scientists) who > curate scientific knowledge bases and databases (e.g. authors and readers of > Oxford's _Database_ journal) as well as write extracts/abstracts for > publishing/aggregating companies like EBSCO. We are discussing (and Figure 1 is showing) the scientific paper at the moment of publication. We argue that this is the relevant moment to look at, if we interpret the term "semantic publishing" literally (otherwise we should call it something like "semantic post-publication enhancement"). In our proposed usage of the term, it doesn't matter how many scientists or entire industries semantify the main message of the paper at a later point. It doesn't make the paper a semantic *publication* (in the genuine semantic publishing sense). > I find this statement unhelpful: "It seems to be a common unquestioned > assumption that the semantic representation of knowledge has to start from a > textual representation, and therefore writing a statement down in natural > language always needs to be the first step." It really doesn't matter which > comes first. It does matter for approaches like semantic annotation, because semantic annotation assumes that the text is already there when you start annotating. > However the (linguistic) semantics of your narrative are much richer than > those of your TriG. Narrative does have an important role. We fully agree. The narrative has richer semantics, whereas our RDF representations have more precise semantics. We argue that we need both. > Historians and sociologists of science have argued that writing up work (and > specifically writing and revising narrative arguments for presentation and > publication) helps form it. Lavoisier provides a vivid example: as Moore > summarizes, "Lavoisier wrote at least six drafts of the paper over a period of > at least six months. However, his theory of respiration did not appear until > the fifth draft. Clearly, Lavoisier's writing helped him refine and understand > his ideas." (Moore, Randy. "Language—A Force that Shapes Science." Journal of > College Science Teaching 28.6 (1999): 366. > http://www.jstor.org/stable/42990615) A longer treatment (complete with > facsimiles of Lavoisier's manuscripts) can be found in Holmes, Frederic L. > "Scientific writing and scientific discovery." Isis 78.2 (1987): 220-235. > http://www.jstor.org/stable/231523 Thank you for these very interesting pointers. In fact, we think that the importance of writing and revisions is a strong argument for (and not against) our point: With approaches like semantic annotations, semantics come into play only at the very end, once the text is finalized. We argue that we should treat the semantic representations as first-class citizens that participate in the all these iterations together with the narrative text. We clarify this now in the respective paragraph and also refer to the second paper you mentioned. > I suppose that in the end my main concern is that your TriG representation > does not do the narrative of your paper justice; and I am unsure how "genuine > semantic publishing" would do any better, on average in representing the > content of the paper. Hence my concern with the intended use here. We think that our comments above and our latest revision (including the added paragraph in Section 5 to put our own semantic representation into perspective) address this concern. > Do you think that we need to do anything with historical papers? (e.g. do we > already have the knowledge they represent? do we need it?) It is not clear > from the point of view presented. You disparage certain activities (such as > the Semantic Publishing Challenges) -- do you see any value in those (even > though they don't address publishing per-se? Yes, we definitely do. We clarify this now in an additional paragraph in the end of Section 2. > This seems an overstatement: "The possible use of RDFa to formally represent > not just meta data but also high-level claims, hypotheses, and arguments is > sometimes proposed, but no concrete solutions are presented." Certainly, > integrating nanopublications or micropublications has been proposed -- what is > not concrete enough? Regarding RDF in general, older approaches have been > taken. This one comes to mind: Li, Gangmin, Victoria Uren, Enrico Motta, Simon > Buckingham Shum, and John Domingue. "Claimaker: Weaving a semantic web of > research papers." In International Semantic Web Conference, pp. 436-441. > Springer, Berlin, Heidelberg, 2002. If I am misunderstanding your claim, > consider how you might make the scope of your statement clearer. This sentence was meant to only apply to scholarly HTML approaches, which support but do not further specify how to represent such higher-level statements. The sentence is now clarified. > "Structured abstracts" have a specific and more general meaning; I'd suggest > that about "structured digital abstracts" you dwell a bit more on the papers > you are citing to discuss what they did. We added a sentence to clarify this. > Alongside RASH I would suggest mentioning and citing dokieli (which anyway you > are already using for an alternate representation) where you mention that in > the text. And as mentioned above, it is not clear why these are elided from > your review proper. Agreed. We now explicitly mention and refer to Dokieli. We previously wanted to focus only on the Linked Research principles and not on the specific implementations, in line with their philosophy, but we agree that it makes more sense to mention them both. We consider both, RASH and Dokieli, to be instances of scholarly HTML. Therefore, we did not leave them out, but we now make this connection more explicit. > Some statements about micropublications do not fit with my understanding; > [...] Note that semantic qualifiers enable indexing claims with existing > identifiers. And in micropublications I think a real strength is the attention > to arguments within a paper; but the suggestion that micropublications are > *limited* to the scope of a paper is not right (e.g. "stick to the article as > their unit of publication" -- do you mean something more subtle there?). In > the Micropublications JBS article see especially Figure 11 "Connected support > relations of three arguments give a Claim network across three publications." We are referring to the "unit of *publication*" not to the "unit of reference" or "unit of interlinking". The object that is published with micropublications is still a sort of article (albeit "micro") and not an individual statement. > I think that your work on AIDA sentences and the proposal to use them (along > with some hedging/uncertainty markers) for nanopub publishing is great -- but > I don't think that this is the same thing as representing the internal > structure of the argument. I'd be very happy to hear what I'm > misunderstanding. We agree that this is not the same thing and that our work on AIDA sentences only makes a very small step in that direction. Our reference to that paper (Kuhn et al. ESWC 2013) in the context of nanopublications was indeed confusing. We removed the respective sentence and the citation to the paper. > You say SPAR is highly valuable -- how/for what? Who is using it? How should > it be used? This is clarified in the text. > I'm surprised that you don't mention CNL; especially regarding "Explaining a > result in a narrative is simpler than formally modeling it, in the sense that > natural language allows the writer to remain vague and even ambiguous." (which > seems to me not true for CNLs.) A controlled natural language is, by definition, not a natural language, and so the existence of unambiguous CNLs does not contradict our statement. We did not want to make this a paper about CNL and we decided to not include a deeper discussion into CNL to avoid the danger that readers might feel that CNLs are somehow our favorite approach to achieve genuine semantic publishing. I personally think that CNLs could turn out to become a very useful concept to achieve it, but this is not the point of our paper. > Stating "we argue that" does not give a justification or rationale. Why do you > think this? "Furthermore, we argue that the semantic representations need to > be a primary component with an existence in their own right, to call it a > genuine semantic publication. The main thing that is published needs to have a > semantic representation, and this semantic representation needs to have an > independent existence." We rephrased these sentences. > Availability at time of publication seems to go in the other direction: they > should be temporally locked to the original. They should be locked together, but both should be equally "original". > The notion of "essence" or "main message" is not operationalized. We believe the following sentence operationalizes these notions as far as this is possible for such kinds of criteria: "If you had to summarize a paper in one sentence, the content of this sentence has to be present in the semantic representation too." > Data representation of the paper could be stored in a FAIR repository. We will do that as soon as we have the final version of the data. > (No I am not reading this on a beach. :D ) We are sorry for that :) > For "Meta data" personally I would write "metadata". We changed this. > Explicitly reference the supplement when talking about files (e.g. end of > section 5) Done. > Consider writing a longer conclusion. We considered this, but given that we are already approaching the word limit, we decided to further clarify our contributions throughout the paper and stick to a shorter conclusion. > Table 1 would benefit from shading (e.g. on alternate rows) to aid the eyes. Done. > Figure 2's caption could include the URL to your actual landing page. We added a reference to the URL in the text. > Add hyphens: "English-speaking agents", "English-based representation", > "RDF-speaking agents", "RDF-based representation" Done. > "who are called authors" (not "which" here) Done. > Reference 15 is missing a venue. Check capitalization especially in #25 > and #30. We fixed these and other minor issues with the references.
1 Comment
Link to Final PDF and JATS/XML Files
Submitted by Tobias Kuhn on
https://github.com/data-science-hub/data/tree/master/publications/1-1-2/ds-1-1-2-ds010