PIDs, please play FAIR and identify yourselves!

Tracking #: 547-1527

Authors:

NameORCID
Joakim PhilipsonORCID logo https://orcid.org/0000-0001-5699-994X


Responsible editor: 

Alejandra Gonzalez-Beltran

Submission Type: 

Position Paper

Abstract: 

This is an extended version of [32], first presented at the SAVE-SD 2017 workshop in Perth, Australia. In this comprehensively revised and updated version an example is given describing how scientific names can provide context and meaning, as a backdrop to the ensuing suggestion that PIDs, persistent identifiers - now often failing to do so, should also include contextual, semantic elements. As in the original version, findability and interoperability of some PIDs and their compliance with the FAIR data principles are explored, where ARKs were added in this version. It is suggested that the wide distribution and findability (e.g. by simple 'googling') on the internet may be more important for the usefulness of identifiers, than the resolvability of PID URI-links. New reasoning about how the failure to use PIDs such as DOIs - even when they exist, for citation, is supplied in this version. The prevalence of phenomena such as link rot implies that the persistence of URIs cannot be trusted. By contrast, the well distributed, but seldom directly resolvable ISBN identifier has proved remarkably resilient, with far-reaching persistence, inherent structural meaning and good validatability, by means of fixed string-length, pattern-recognition, restricted character set and check digit. Various examples of regular expressions used for validation of e.g. DOIs are supplied or referenced here. The suggestion to add context and meaning to PIDs, thereby making them "identify themselves", through namespace prefixes and object types is more elaborate in this version. Meaning can also be conferred by means of structural elements, such as well defined, restricted string patterns, that at the same time make PIDs more "validatable". Concluding this version is a generic, refined model for a PID with these properties, in which namespaces are instrumental as custodians, meaning-givers and validation schema providers. A draft example of a Schematron schema for validation of new PIDs in accordance with the proposed model is also provided.

Manuscript: 

Tags: 

  • Under Review

Special issue (if applicable): 

Special issue of Data Science , including a selection of extended papers from SAVE-SD 2017 and 2018

Data repository URLs: 

none

Date of Submission: 

Friday, November 16, 2018