Reviewer has chosen to be AnonymousOverall Impression:
UndecidedTechnical Quality of the paper:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
The paper tackles the problem of assessing the “FAIRness” of research data and presents a semi-automatic pipeline to FAIR maturity indicators. The pipeline is demonstrated in a Jupyter notebook and illustrated in two use cases in the domain of Life Sciences. The proposed pipeline follows the principles and guidelines recommended by the maturity indicator authoring group, in addition, to integrate concepts from the state of the art. Specifically, they proposed pipeline satisfies 13 FAIR principles and allows for the retrieval of data collection by accessing different data repositories. The metadata describing the process of data collection retrieval is documented in XML. The effectiveness of the proposed pipeline is evaluated in two use cases and the results of the evaluation are illustrated in an FAIR ballon plot. This plot facilitates the visualization of the analysis of the FAIR maturity indicators during the process of data collection retrieval to answer scientific questions. Finally, two users are consulted to analyze the usability of the proposed pipeline.
Overall the paper is well-written and addresses the problem of ensuring FAIR principles during the semi-automatic execution of a scientific pipeline. The implementation provided as a Jupiter notebook enables the execution of the pipeline and reproducibility of the results of results reported in the paper; the Jupyter notebook is not only published in a GitHub repository but also can be run in Binder. Thus, the proposed pipeline is presented as a resource that also follows the FAIR principles. Nevertheless, because of the lack of description of the proposed pipeline, its full potential and limitations are not transparently presented. In particular, the following points are not clear in the paper.
To conclude, the paper presents an approach that has great potential and relevant to the scientific community. However, the lack of description of the proposed workflow impedes from a clear evaluation of its benefits and limitations. These issues reduce the value of the current version of this work and prevent a positive evaluation in terms of generality and innovation. The recommendation is to address these issues and resubmit the paper.
Reasons to accept:
Strong Points (SP)
A resource for evaluating the FAIRness of the data collections retrieved during the execution of a research question.
Live code of the pipeline accessible via a Jupyter notebook.
Clear visualization of the summary of the values of the results?
Reasons to reject:
Weak Points (WP)
The components of the pipeline are vaguely defined. It is not clear what is the innovation of the proposed workflow from a computational point of view
There are many issues that are not clearly describing, reducing thus the understanding of the potential of the proposed work.
Questions to the authors (QA):
QA1) What is the main component of the workflow implemented in the pipeline presented in this paper?
QA2) How a research question (e.g., “What are the differentially expressed genes between normal subjects and subjects with Parkinson’s diseases in the brain frontal lobe?”) is interpreted?
QA3) Why state-of-the-art Name Entity Recognition (NER) tools are not used to support this task?
QA4) Why only two use cases were selected? Two use cases are not enough to show the features of a given approach.
QA5) Why did only two users evaluate the pipeline? Which criteria were followed to select these two evaluators? Under which conditions were the pipeline was evaluated?
QA6) Why controlled vocabularies and semantic enrichment techniques are not utilized to describe the metadata of the datasets?
QA7) What will be the behavior of the proposed workflow if several data collections are relevant for answering a research question? How the reported measures will be computed?