Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Suggested Decision: Accept
Technical Quality of the paper: Average
Presentation: Weak
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: This manuscript is too long for what it presents and should therefore be considerably shortened (below the general length limit)
Summary of paper in a few sentences (summary of changes and improvements for
second round reviews):
The paper reviews existing persistent identifiers, their use and usefulness in the context of the FAIR principles. Based on those findings, it presents a new format that seeks to offer the features of the most successful systems.
Reasons to accept:
Overall, as my answers to the specific questions show, I'd say this is a weak accept. There's some good material in the paper. For example, the text includes comparisons of the longevity of different PIDs and the accuracy with which they lead to the identified thing. That's useful information although the paper would benefit from presenting those details as tables, charts etc.
Reasons to reject:
The paper would benefit from better layout, diagrams etc. and none of the references appear in the text due to poor HTML.
More substantive, I'd suggest, is that there's a big dollop of open access/paywall discussion that, for this paper, is irrelevant. In my opinion, this should be removed. The paper is about the discoverability, resolvability and persistence of IDs - stick to that topic and don't go into a discussion of paywalls/access to research etc.
Nanopublication comments:
Further comments:
I'd also suggest a switch around in the order of presentation. The suggested new PID format comes at the end rather than at the beginning: "this paper proposes a new PID structure that addresses a series of issues identified with existing structures" or some such.
Due to my current work at GS1, I found the comment on ISBNs very interesting. They are indeed persistent and pre-date the Web by some decades. Most importantly - they are used throughout the industry that runs the system (the publishing, supply chain and retail industries). The paper refers several times to the - correct - notion that persistence comes from usage rather than design. The check digit is included so that if the point of sale scan fails, the number can be entered into the till manually and there's a good degree of checking that it was entered correctly.
However, sadly, it is not true that ISBNs provide a 1-1 mapping. ISBNs are just a special case for the even more widely used UPC/EAN numbering system used on all manner of goods. At the end of the day, it's just a number - and they are cloned/re-used around the world. It's being addressed, sure, and it's not a massive problem, but it's not as perfect as the paper suggests.
One thing - and this I must admit is a personal hobby horse - persistence is a matter of policy, not technical design. Link rot is a real problem because people allow it to happen, not because of an innate property of the Web. I'd have loved to have seen that point included in the paper.
1 Comment
Meta-Review by Editor
Submitted by Tobias Kuhn on
Dear Joakim,
Thanks for your article re-submission and for the responses to reviewers. You will see that from this new round, we received 5 reviews from experts on identifiers from ORCID, ARK, CrossRef, identifiers.org and web architecture/persistent identifiers. There are varying opinions on the significance and novelty of your contribution, and several suggestions for improvements. In particular, some of the arguments made are not well-justified. I agree with those comments and suggestions for improvement, and I am listing below more issues that must be addressed. As a consequence, my recommendation is to accept the contribution, conditionally to all the changes being incorporated and the paper improved. I also expect to see an enumeration of the changes and justification on how the suggestions were addressed.
I am considering that this is a position paper, and the journals’ submission guidelines indicate: “We accept position papers presenting discussions and viewpoints around Data Science topics. These papers do not need an evaluation, but need to present relevant and novel discussion points in a thorough manner.” (see https://datasciencehub.net/content/guidelines-authors)
Thus, I strongly encourage you to present the discussion justifying all your arguments in a thorough manner as part of the condition for acceptance.
I find that many of your statements in the paper (as well as in the response to reviewers) are not well-justified. For example, in the section about FAIR principles you indicate “There are several cases where general data repositories, professing to be FAIR and adhere to accepted metadata standards both for their default output and export formats, nevertheless fail to validate against schemas of these same standards.” This statement is not justified neither with examples nor with a citation (also see issue with citations that I raise in the formatting issues below) and this should be addressed. If you say “there are several cases”, you should provide examples of those cases, including what export formats and why they fail validation, and/or a reference that provides those examples.
You refer to “validability” of identifiers, and refer to regular expressions. There are already systems that maintain such expression for identifier validation, such as identifiers.org (see for example the entry for DOI and its regular expression: https://www.ebi.ac.uk/miriam/main/collections/MIR:00000019). How does this affect your arguments? Justify.
In the text, you are referring to interoperability and then say “This is also used by fairmetrics.org as a measure of Findability.” - how is interoperability used as a measure of findability? I don’t think that is correct and you provide no explanation.
You insist that the FAIR principles don’t refer to findability, but you say: “However, the FAIR principles do not say anything explicitly about validation. Particularly for the principles of Interoperability and Re-usability, it is crucial that metadata can be properly validated against a schema, as adhering to an accepted metadata standard.” So, you are providing a counter-argument that in fact the FAIR principles refer to validation explicitly. Please, clarify.
You introduce examples using Life Science Identifiers (LSID) but do not discuss the issues around them. You can check the Wikipedia entry about LSID (https://en.wikipedia.org/wiki/LSID) and in particular, see “Controversy over the use of LSIDs” - https://lists.w3.org/Archives/Public/www-tag/2006Jul/0041; How does this discussion affect your arguments? Include a justification.
In the section ‘Resolvability or findability?’, you mention that “FAIR principles, the focus is very much on resolvability of identifiers despite the general awareness of phenomena like 'link rot' and 'reference rot'. What is the basis for this claim? The FAIR guiding principles don’t refer to link rot issues.
You mention ‘When someone in an ensuing Twitter conversation complained about this, an answering tweet seemed to mean, that was the price we have to pay for something as useful as DOIs. ‘ Twitter could provide some anecdotal material, but a tweet is not a a good reference for justifying a claim for a scientific article. In addition, you say that they “seem to mean” - this interpretation again doesn’t help in making a case. Moreover, the tweets are not referenced. But please, use more reliable references to justify your arguments instead of tweets.
About your proposal of a new identifier schema that maintains context in the identifier, and thus it is not opaque, I would like to see an explanation on how your scheme would handle the identification of objects that might change or evolve in the future. For instance, consider the identification of genes, whose information may evolve in time according to new scientific discoveries being made about it. Also, I would like to see a presentation on how your proposal improves the other identifier schemes, and how improves FAIRness.
Please address all the suggestions made by John Kunze and the modifications proposed in the Google document, including addressing his points around weak arguments, such as
It is not clear why, “for example, the argument that usability and persistence depend on validatability.”
“it is not clear why the object type, and registrant "modules" proposed for the PID could not live next to, but outside the PID, in a citation.”
Also, address Patricia Feeney’s points, including full justification for:
“case for non-opaque identifiers was not clearly stated, the author still conflates accessibility with discoverability and doesn't address arguments for opaque identifiers.”
Also address all the suggestions made by Phil Archer :
Improve the presentation including diagrams and tables when relevant (e.g. to show the longevity of different PIDs and the accuracy with which they lead to the identified thing
Switch the order of the presentation to show your proposal of the PID format first; I would suggest that you also provide a comparison table to show how your PID format would improve the issues that you highlight
Address the issue raised around ISBNs
Formatting issues:
Please, fix the citations, the HTML is not well-formatted. In some cases your citations appear as a link from an underscore symbol and it is even not clear to what statement the citation corresponds to. (e.g. “This happens although there is an understanding that [u]nique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe. A recent study of some 40 research data repositories... ” - in this case, I imagine you are citing the data citation principles to justify the first sentence, but it is not clear what citation is included to justify the statement about the 40 data repositories. Please, revise all citations and that they can be seen properly.
The problem with the citations may also be the reason why the paper shows as you introduce many acronyms without indicating what they stand for. For example, the introduction mentions ORCIDs, RORs ARK, DOI, UUID but the references not included. Please, fix these to include citations especially on the first mention of each acronym. In addition, the paper would benefit from including a glossary listing all the acronyms and their definitions.
I look forward to receiving a thoroughly revised version of your article.
Many thanks,
Alejandra Gonzalez-Beltran (http://orcid.org/0000-0003-3499-8262)