Towards FAIR principles for research software

Tracking #: 606-1586

Authors:

	Name	ORCID
	Anna-Lena Lamprecht	https://orcid.org/0000-0003-1953-5606
	Leyla Jael Castro	https://orcid.org/0000-0003-3986-0510
	Mateusz Kuzak	https://orcid.org/0000-0003-0087-6021
	Carlos Martinez	https://orcid.org/0000-0001-5565-7577
	Ricardo Arcila	https://orcid.org/0000-0002-8253-7375
	Eva Martin	https://orcid.org/0000-0001-8324-2897
	Victoria Dominguez Del Angel	https://orcid.org/0000-0002-5514-6651
	Stephanie van de Sandt	https://orcid.org/0000-0002-9576-1974
	Jon Ison	https://orcid.org/0000-0001-6666-1520
	Paula Andrea Martinez	https://orcid.org/0000-0002-8990-1985
	Peter McQuilton	https://orcid.org/0000-0003-2687-1982
	Alfonso Valencia	https://orcid.org/0000-0002-8937-6789
	Jennifer Harrow	https://orcid.org/0000-0003-0338-3070
	Fotis Psomopoulos	https://orcid.org/0000-0002-0222-4273
	Josep Ll. Gelpi	https://orcid.org/0000-0002-0566-7723
	Neil Chue Hong	https://orcid.org/0000-0002-8876-7606
	Carole Goble	https://orcid.org/0000-0003-1219-2137
	Salvador Capella-Gutierrez	https://orcid.org/0000-0002-0309-604X

Responsible editor:

Paul Groth

Submission Type:

Position Paper

Abstract:

The FAIR Guiding Principles, published in 2016, aim to improve the findability, accessibility, interoperability and reusability of digital research objects for both humans and machines. Until now the FAIR principles have been mostly applied to research data. The ideas behind these principles are, however, also directly relevant to research software. Hence there is a distinct need to explore how the FAIR principles can be applied to software. In this work, we aim to summarize the current status of the debate around FAIR and software, as a basis for the development of definite community-agreed principles for FAIR research software in the future. We discuss what makes software different from data with respect to the application of the FAIR principles, present an analysis of where the existing principles can directly be applied to software, where they need to be adapted or reinterpreted, and where the definition of additional principles is required. Furthermore, we discuss desired characteristics of research software that go beyond FAIR.

Manuscript:

ds-paper-606.zip

Special issue (if applicable):

FAIR Data, Systems and Analysis

Data repository URLs:

none

Date of Submission:

Friday, August 16, 2019

Date of Decision:

Friday, October 4, 2019

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 04/Sep/2019

By Zhiming Zhao ORCID logo

https://orcid.org/0000-0002-6717-9418

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Undecided
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: This manuscript is too long for what it presents and should therefore be considerably shortened (below the general length limit)

Summary of paper in a few sentences:

The paper discussed an important topic in the context of data management and open sciences: fairness of the research software. The paper reviewed the current FAIR principles and discussed the possible changes to meet the different requirements for FAIR research software. In general, the paper is clearly structured and readable.

Reasons to accept:

Revision is needed before accept, but changes should be made. Detailed reasons see further comments below.

1. In the paper, the authors highlighted that the key differences between data and software are the dynamic nature and executability of the software. However, the discussion is too abstract and lack of details. Research software can exist in different packing forms, e.g., as code, executables, services, containers, workflows, etc. Some of the forms, like containers, can well capture the runtime contexts, and can partially automate the deployment and execution. For these different forms of research software, do they all have the same needs to change the current FAIR principles?
2. Look at table 1, some changes replace the “data” with “research software” from the original FAIR principles, or make the “research software” explicit in the statement, e.g., F2, F4, ….. In this case, those “rephrase” is more like an instantiation of general principle to a specific digital object, without actually changing its original definition.
3. Can author elaborate a bit to which extent the extended changes (I1, I4S, ) in Table 1 can be solved by applying rich contextual metadata of research software, e.g., dependencies, QoS, etc? If it can, those changes might also be seen as “instantiation” of general principles. If not, why? It will help the reader to understand why we have to make those changes for FAIR research software?
4. In section 2, the author said: required access control for individual data sets (e.g., patients electronic health records, genomics sequences) have prevented open data from becoming a FAIR principle (see go-fair FAQ[26]). This sentence implies that it is access control that prevents open data from becoming FAIR compliant. From [26], we can see that FAIR data does not have to be open, and Open data does not mean it is FAIR. FAIR data can be access-controlled, depends on its specific phase in the lifecycle; and Open data does not mean it is FAIR compliant if it does not have rich metadata for reusability and interoperability. So, the author needs to make this sentence better formulated.

Reasons to reject:

The paper still some refinement before being published.

Nanopublication comments:

Further comments:

Review #2 submitted on 01/Oct/2019

By João Moreira ORCID logo

https://orcid.org/0000-0002-4547-7000

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Average
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Clear novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper exploits how to improve the FAIRness of research software, discussing the difference of applying the FAIR principles into static datasets (“data”) against software. The paper presents an analysis about how existing principles can be directly applied to software, discussing the needs of adapting, reinterpreting or adding new principles. The paper summarizes the current state of the debate about FAIR software, arguing that software is not data and discussing whether quality dimensions of software should impact the FAIR principles. Interoperability principles are pointed out as the main challenging ones for research software and were the most impacted by the proposed “FAIR for software”: I2 was reinterpreted, extended and split in 2, I3 discarded and I4S newly proposed.

Reasons to accept:

In general, the paper is well written and has a high significance, addressing an urgent problem under discussion (for a long time): the application of FAIR principles in the development of research software. Moreover, it provides pointers to the recent events and some of their outputs. Therefore, the work performed has solid foundations and the paper describes well the main issues when interpreting the FAIR principles for software. The intention of specializing/detailing each FAIR principle for software is extremely relevant.

Reasons to reject:

- Although the paper provides the Table 2, listing the recent events with discussions around FAIR and software, it is quite superficial about the lessons learned from these events.

- Section 2 is poorly structured/distributed, it has 3 paragraphs claiming that software is not data and one subsection discussing quality dimensions (2.1). Consider restructuring for something like: 2.1. Lessons learned from FAIR events; 2.2. Distinctions between static datasets and software; 2.3. Quality dimensions in FAIR principles.

- Section 2 lacks on foundations regarding the discussion whether software is data or not. In computer science theory, the term “digital data” refers to any sequence of symbols that is represented using the binary system (0 or 1). And a software is a set of instructions that a computer can execute, which is ultimately transformed (even if it is interpreted) in binary executable file(s). Therefore, according to this definition, a software is indeed a special type of digital data (“software is often regarded as a special kind of data”). I understand that the intention of this section is to explicit the differences between static datasets and software, but in my opinion there is no sense on stating that “software is not data” (title of the section). We can say that “digital data” is a super-class (an ultimate sortal - a kind) while “software” and “static dataset” are sub-classes (subkinds), i.e., they share the same identity principles and they are rigid sortals of "digital data".

- Although it is clear that “Interoperability turned out to be the most challenging principle”, there is a lack on revisiting research on interoperability, particularly the approaches for the different levels of interoperability (e.g., semantic versus syntactic). These research are commonly found in communities that target interoperability within software engineering and information systems for a long time, like I-ESA, Semantics, ISWC, FOIS, Modelsward, etc.

- The 3 definitions of “software interoperability” (section 3.3) lack on references and focus only on the form (syntactic interoperability), they should also cover semantic interoperability. Consider linking these definitions to the traditional references (IEEE, 1990) and (Heiler, 1995).

- The intention of specializing/detailing each FAIR principle for software is nice and quite necessary. However, since a software is a sub-class (subkind) of data, removing a principle would break the specialization relation – the case of I3. Furthermore, the argument to remove I3 principle is weak: “Despite all the complexity associated with software dependencies, there are not semantically meaningful information on it” is not true since software dependencies are “a complex network of interconnected modules that precludes the software building”, and therefore, they provide semantically meaningful information. In my point of view, the newly proposed I4S represents a subset of I3 and, therefore, should be classified as “rephrased and extended”.

- The FAIR Data Point implementation reference is an exemplar FAIR research software, but it is not cited in the paper. The paper lacks on giving examples of existing research software that are FAIR (or have high FAIRness).

- Problem of “what is rich metadata” (subjective) is not addressed, only rephrased from data to software (F2).

Nanopublication comments:

Further comments:

- Regarding the Figure 1 (interoperability of research software), I wonder whether OS and hardware are parts of the execution environment, while dependencies and instructions are part of the research software. I recommend to change the picture to make this explicit.

- “Workflow” is defined in the paper as orchestration (or choreography) of services, an approach extensively stressed by SOA research which relies in the so called process (behavior) interoperability. The software interoperability definition (iii) is about process interoperability. Please, add a comment on that.

- The idea of specializing I2 in I2S.1 and I2S.2 is very good, but I recommend to rephrase I2S.2 to “Data structure interfaces of software (e.g., input/output of APIs) are formally described using controlled vocabularies that follow the FAIR principles”. This is a fundamental point to promote machine actionable research, improving the FAIRness of software by making them compliant to open ontologies. Even if their internal data models are different, the exposed services need to follow these models. For example, the FAIR Data Point is a good example of how a research software should expose its data interfaces following open (and standardized) ontologies (such as DCAT and re3data).

- I recommend to FAIRify the table 1 and make it available (RDF) as data produced by this research.

- I wonder if wouldn’t make more sense to append the “FAIR for software” principles in the original FAIR principles. Some rephrased principles are only replacements from “data” to “software” and/or reflects the interpretation of the original principle. A (positive) side effect is to enforce the urgent need of a governance model for the FAIR principles (as pointed out in the conclusions).

Review #3 submitted on 03/Oct/2019

By Remzi Celebi ORCID logo

https://orcid.org/0000-0001-7769-4272

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Excellent
Presentation: Good
Reviewer`s confidence: Medium
Significance: High significance
Background: Comprehensive
Novelty: Clear novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This manuscript highlights the difference of software from static data and discusses how FAIR principles should be applied to software. It provides an analysis of how FAIR principles can be adapted, reinterpreted or directly applied to the software and where additional principles need to be defined.

Reasons to accept:

The paper provides new insights on how to interpret the FAIR concepts for research software. The ideas are in the paper are innovative. Adapting FAIR principles to software and the reasons for this are main contributions of the paper. This work also provides the foundations to develop metrics and associated maturity models for FAIR software.

Reasons to reject:

1- Section 2 in general require better organisation. The whole section seems to poorly structured. The proposed distinction between software and data is a bit superficial. Some of the differences mentioned apply to data as well (eg versioning), but I see no point in extending these principles in this direction.
2- I recommend that the proposed new I4S be part of I3. The claim that "dependencies of a software are not semantically related" for I3 seems very weak. In addition, other information, such as information about how a software interacts with other software, can be included in this category as well as software dependencies.
3- This paper does not include an example of a FAIR software. I think the authors should provide one or more examples of FAIR software to make the concepts clearer. I would also like to see a discussion on how to assess the FAIRness of such software.

Nanopublication comments:

Further comments:

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Fri, 10/04/2019 - 07:01

The reviewers agree that the paper addresses an important area of discussion namely the relationship between research software and the FAIR data principles. The authors should address the reviewers comments in particular with respect to the clarity of writing in some places. It would be also worthwhile calling out some exisiting related work around interoperality and just acknowledging the fundamental computer science notion that software can be treated as data. I think it's clear in this context that there is a fundamental distinction but worth mentioning. I think the last thing to consider is whether you can provide an example of FAIR software? Given that this is a position paper, I think it would be ok to not have it but it would be excellent if this could be provided.

Paul Groth (https://orcid.org/0000-0003-0183-6910)

Data Science

Towards FAIR principles for research software

Tracking #: 606-1586

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Special issue (if applicable):

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor