Reviewer has chosen not to be AnonymousOverall Impression:
UndecidedTechnical Quality of the paper:
Clear noveltyData availability:
Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix thisLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
The paper exploits how to improve the FAIRness of research software, discussing the difference of applying the FAIR principles into static datasets (“data”) against software. The paper presents an analysis about how existing principles can be directly applied to software, discussing the needs of adapting, reinterpreting or adding new principles. The paper summarizes the current state of the debate about FAIR software, arguing that software is not data and discussing whether quality dimensions of software should impact the FAIR principles. Interoperability principles are pointed out as the main challenging ones for research software and were the most impacted by the proposed “FAIR for software”: I2 was reinterpreted, extended and split in 2, I3 discarded and I4S newly proposed.
Reasons to accept:
In general, the paper is well written and has a high significance, addressing an urgent problem under discussion (for a long time): the application of FAIR principles in the development of research software. Moreover, it provides pointers to the recent events and some of their outputs. Therefore, the work performed has solid foundations and the paper describes well the main issues when interpreting the FAIR principles for software. The intention of specializing/detailing each FAIR principle for software is extremely relevant.
Reasons to reject:
- Although the paper provides the Table 2, listing the recent events with discussions around FAIR and software, it is quite superficial about the lessons learned from these events.
- Section 2 is poorly structured/distributed, it has 3 paragraphs claiming that software is not data and one subsection discussing quality dimensions (2.1). Consider restructuring for something like: 2.1. Lessons learned from FAIR events; 2.2. Distinctions between static datasets and software; 2.3. Quality dimensions in FAIR principles.
- Section 2 lacks on foundations regarding the discussion whether software is data or not. In computer science theory, the term “digital data” refers to any sequence of symbols that is represented using the binary system (0 or 1). And a software is a set of instructions that a computer can execute, which is ultimately transformed (even if it is interpreted) in binary executable file(s). Therefore, according to this definition, a software is indeed a special type of digital data (“software is often regarded as a special kind of data”). I understand that the intention of this section is to explicit the differences between static datasets and software, but in my opinion there is no sense on stating that “software is not data” (title of the section). We can say that “digital data” is a super-class (an ultimate sortal - a kind) while “software” and “static dataset” are sub-classes (subkinds), i.e., they share the same identity principles and they are rigid sortals of "digital data".
- Although it is clear that “Interoperability turned out to be the most challenging principle”, there is a lack on revisiting research on interoperability, particularly the approaches for the different levels of interoperability (e.g., semantic versus syntactic). These research are commonly found in communities that target interoperability within software engineering and information systems for a long time, like I-ESA, Semantics, ISWC, FOIS, Modelsward, etc.
- The 3 definitions of “software interoperability” (section 3.3) lack on references and focus only on the form (syntactic interoperability), they should also cover semantic interoperability. Consider linking these definitions to the traditional references (IEEE, 1990) and (Heiler, 1995).
- The intention of specializing/detailing each FAIR principle for software is nice and quite necessary. However, since a software is a sub-class (subkind) of data, removing a principle would break the specialization relation – the case of I3. Furthermore, the argument to remove I3 principle is weak: “Despite all the complexity associated with software dependencies, there are not semantically meaningful information on it” is not true since software dependencies are “a complex network of interconnected modules that precludes the software building”, and therefore, they provide semantically meaningful information. In my point of view, the newly proposed I4S represents a subset of I3 and, therefore, should be classified as “rephrased and extended”.
- The FAIR Data Point implementation reference is an exemplar FAIR research software, but it is not cited in the paper. The paper lacks on giving examples of existing research software that are FAIR (or have high FAIRness).
- Problem of “what is rich metadata” (subjective) is not addressed, only rephrased from data to software (F2).
- Regarding the Figure 1 (interoperability of research software), I wonder whether OS and hardware are parts of the execution environment, while dependencies and instructions are part of the research software. I recommend to change the picture to make this explicit.
- “Workflow” is defined in the paper as orchestration (or choreography) of services, an approach extensively stressed by SOA research which relies in the so called process (behavior) interoperability. The software interoperability definition (iii) is about process interoperability. Please, add a comment on that.
- The idea of specializing I2 in I2S.1 and I2S.2 is very good, but I recommend to rephrase I2S.2 to “Data structure interfaces of software (e.g., input/output of APIs) are formally described using controlled vocabularies that follow the FAIR principles”. This is a fundamental point to promote machine actionable research, improving the FAIRness of software by making them compliant to open ontologies. Even if their internal data models are different, the exposed services need to follow these models. For example, the FAIR Data Point is a good example of how a research software should expose its data interfaces following open (and standardized) ontologies (such as DCAT and re3data).
- I recommend to FAIRify the table 1 and make it available (RDF) as data produced by this research.
- I wonder if wouldn’t make more sense to append the “FAIR for software” principles in the original FAIR principles. Some rephrased principles are only replacements from “data” to “software” and/or reflects the interpretation of the original principle. A (positive) side effect is to enforce the urgent need of a governance model for the FAIR principles (as pointed out in the conclusions).
Meta-Review by Editor
Submitted by Tobias Kuhn on
The reviewers agree that the paper addresses an important area of discussion namely the relationship between research software and the FAIR data principles. The authors should address the reviewers comments in particular with respect to the clarity of writing in some places. It would be also worthwhile calling out some exisiting related work around interoperality and just acknowledging the fundamental computer science notion that software can be treated as data. I think it's clear in this context that there is a fundamental distinction but worth mentioning. I think the last thing to consider is whether you can provide an example of FAIR software? Given that this is a position paper, I think it would be ok to not have it but it would be excellent if this could be provided.
Paul Groth (https://orcid.org/0000-0003-0183-6910)