PIDs, please play FAIR and identify yourselves!

Tracking #: 547-1527

Authors:

NameORCID
Joakim PhilipsonORCID logo https://orcid.org/0000-0001-5699-994X


Responsible editor: 

Alejandra Gonzalez-Beltran

Submission Type: 

Position Paper

Abstract: 

This is an extended version of [32], first presented at the SAVE-SD 2017 workshop in Perth, Australia. In this comprehensively revised and updated version an example is given describing how scientific names can provide context and meaning, as a backdrop to the ensuing suggestion that PIDs, persistent identifiers - now often failing to do so, should also include contextual, semantic elements. As in the original version, findability and interoperability of some PIDs and their compliance with the FAIR data principles are explored, where ARKs were added in this version. It is suggested that the wide distribution and findability (e.g. by simple 'googling') on the internet may be more important for the usefulness of identifiers, than the resolvability of PID URI-links. New reasoning about how the failure to use PIDs such as DOIs - even when they exist, for citation, is supplied in this version. The prevalence of phenomena such as link rot implies that the persistence of URIs cannot be trusted. By contrast, the well distributed, but seldom directly resolvable ISBN identifier has proved remarkably resilient, with far-reaching persistence, inherent structural meaning and good validatability, by means of fixed string-length, pattern-recognition, restricted character set and check digit. Various examples of regular expressions used for validation of e.g. DOIs are supplied or referenced here. The suggestion to add context and meaning to PIDs, thereby making them "identify themselves", through namespace prefixes and object types is more elaborate in this version. Meaning can also be conferred by means of structural elements, such as well defined, restricted string patterns, that at the same time make PIDs more "validatable". Concluding this version is a generic, refined model for a PID with these properties, in which namespaces are instrumental as custodians, meaning-givers and validation schema providers. A draft example of a Schematron schema for validation of new PIDs in accordance with the proposed model is also provided.

Manuscript: 

Tags: 

  • Reviewed

Special issue (if applicable): 

Special issue of Data Science , including a selection of extended papers from SAVE-SD 2017 and 2018

Data repository URLs: 

none

Date of Submission: 

Friday, November 16, 2018

Date of Decision: 

Tuesday, January 29, 2019

Decision: 

Undecided

Solicited Reviews:


1 Comment

Meta-Review by Editor

Overall, the majority of reviewers have indicated several weaknesses of the paper, highlighting multiple points that must be addressed before the paper can be considered for publication.

 

Reviewer #3 was unfortunately very brief, but when asked for more details, indicated:

“I do not believe including extensive semantics in the identifiers is a good idea. We are trying to move away from this even though it is very difficult. I would like to see a common service model to retrieving metadata about identifiers. The paper also proposes a model but there is no technical solution behind this model to find out how difficult or easy to generate or maintain such a semantically rich identifier. The paper does not describe which community is interested in taking up such a semantically rich identifier scheme.”

 

My overall opinion is that while the paper discusses some potentially interesting points with respect to persistent identifiers and the FAIR principles, it has many drawbacks and confuses some concepts. So, my recommendation is that the paper should undergo a major revision addressing all the reviewers’  comments as well as the points I am listing below. After that, it will be sent for a second round of reviews.

 

Thus, in addition to all the reviewers comments, and in particular the detailed points made by Reviewer #4, I would ask you to also consider and address the following comments:

 

  • You mention that “the FAIR principles do not say anything explicitly about validation”. However, to be reusable (in R1.3), the FAIR principles require (meta)data to meet domain-relevant community standards. This means precisely that “metadata can be properly validated against a schema, as adhering to an accepted metadata standard”, which you indicate it is not covered. For more details on this, you can check the ongoing work on implementing the FAIR principles (some of which you included as reference: e.g. FAIR metrics).

  • You refer to identifiers validation and patterns for identifier types. These patterns is something maintained at identifiers.org (see e.g. https://www.ebi.ac.uk/miriam/main/collections/MIR:00000110). Have you considered that?

  • You refer to ordinary or plain URIs and resolvable URIs. I recommend checking ‘Study on Persistent URIs’ (https://philarcher.org/diary/2013/uripersistence/), which provides information on the persistent identifiers in the context of the Web architecture. I recommend you follow the terminology defined there. Another important reference is ‘Cool URI’s don’t change’ (https://www.w3.org/Provider/Style/URI).

  • I also recommend you check the paper “Identifiers for the 21st century: How to design, provision and reuse persistent identifiers to maximize utility and impact of life science data” (https://doi.org/10.1371/journal.pbio.2001414, disclaimer: I’m one of the authors). You suggest adding context to the identifiers - please check Lesson 4 as well as the e.g. the OBO Foundry identifiers policy: http://www.obofoundry.org/id-policy.html for resources explaining why identifiers should be opaque.

  • While the paper aims at discussing identifiers in the context of the FAIR principles, the principles themselves are interpreted without referring to the different elements in their definition (that you included in Figure 1). For example, when discussing findability, you refer to ‘googling’ rather than analysing the actual four elements described in the principle to be findable (F1,F2,F3,F4).

  • The whole text is driven by examples to discuss the arguments you make, where some conceptual elements are conflated and or confused (see points above and Reviewers #4 points). Instead, I recommend discussing strong conceptual points, which should be supported by the examples, rather than the other way around.

  • As indicated by the reviewers, you need to differentiate your proposed model from existing ones, explaining how it addresses their drawbacks and what are the strengths of the new model. Please also address where the model is/will be applied and what is the community that is addressing.

  • Please, add a conclusion section summarising the paper contributions.

  • Please, also revise the text (e.g. there are typos such as ‘homonymi’ instead of homonymy).

 

Many thanks,


Alejandra Gonzalez-Beltran (http://orcid.org/0000-0003-3499-8262)