Abstract:
The sharing of scientific and scholarly data has been increasingly promoted over the last decade, leading to open repositories in many different scientific domains. However, data sharing and open data are not final goals in themselves, the real benefit is in data reuse, which allows leveraging investments in research and enables large-scale data-driven research progress.
Focusing on reuse, this paper discusses the design of an integrated framework to automatically take advantage of large amounts of scientific data extracted from the literature to support research, and in particular scientific model development. Scientific models reproduce and predict complex phenomena and their development is a rather challenging task, within which scientific experiments have a key role in their continuous validation.
Starting from the combustion kinetics domain, this paper discusses a set of use cases and a first prototype for such a framework which leads to a set of new requirements and an architecture that can be generalized to other domains. The paper analyzes the needs, the challenges and the research directions for such a framework, in particular those related to data management, automatic scientific model validation, data aggregation and data analysis, to leverage large amounts of published scientific data for new knowledge extraction.
1 Comment
Meta-Review by Editor
Submitted by Tobias Kuhn on
I thank the authors for having provided a revision of their paper that touched all the points raised by the reviewers.
After reading the revision and the way the authors have addressed the various points raised, I'm in favour to accept the article. However, there are two points that must be addressed in the camera ready which are crucial, from my perspective:
1. even if the authors said that the ontology they have used in the prototype is just a toy model, I would like to see it published somewhere on the Web, identified by a persistent URL (see w3id.org for that), and accompanied by an appropriate (HTML?) documentation. Even if it is a toy ontology, it can be useful to the community, is part of the data presented in the article, and as such must be available. I suggest to use some tool, e.g. Widoco (https://github.com/dgarijo/Widoco), for creating the documentation directly from the annotations contained in the ontology itself.
2. I urge to see both **software** and the **exemplar data** that are included in the GitHub repository (which is a reasonable first step though) published in appropriate repositories so as they can have assigned a DOI and be cited in the paper – and be cited in the future as a proper bibliographic resource. In particular, the fact that the software is only a prototype is not a valid excuse for not assigning a DOI to it. Creating a release in GitHub does not "close" systematically the repository, but rather allow you to put a clear milestone of what was the code used in the specific article. Thus, I must ask to the authors to do create a GitHub release of their code so as to get a DOI from Zenodo, and then cite it properly in the paper. The same thing should be done with the samples data that have been made available online, by uploading them in Zenodo/Figshare, associating the appropriate metadata, so as to cite them as well in the article. Please include the references to the software and the data in the reference list of the article – more info at https://peerj.com/articles/cs-1/ and https://peerj.com/articles/cs-86/.
Minor:
- page 8 line 22: issue in encoding at the end of the line
Have a nice day :-)
Silvio Peroni (https://orcid.org/0000-0003-0530-4305)