Towards a scientific data framework to support scientific model development

Tracking #: 562-1542

Responsible editor: 

Silvio Peroni

Submission Type: 

Article in Special Issue (specify below)


The sharing of scientific and scholarly data has been increasingly promoted over the last decade, leading to open repositories in many different scientific domains. However, data sharing and open data are not final goals in themselves, the real benefit is in data reuse, which allows leveraging investments in research and enables large-scale data-driven research progress. Focusing on reuse, this paper discusses the design of an integrated framework to automatically take advantage of large amounts of scientific data extracted from the literature to support research, and in particular scientific model development. Scientific models reproduce and predict complex phenomena and their development is a rather challenging task, within which scientific experiments have a key role in their continuous validation. Starting from the combustion kinetics domain, this paper discusses a set of use cases and a first prototype for such a framework which leads to a set of new requirements and an architecture that can be generalized to other domains. The paper analyzes the needs, the challenges and the research directions for such a framework, in particular those related to data management, automatic scientific model validation, data aggregation and data analysis, to leverage large amounts of published scientific data for new knowledge extraction.


Supplementary Files (optional): 

Previous Version: 


  • Reviewed

Special issue (if applicable): 

Special issue of Data Science, including a selection of extended papers from SAVE-SD 2017 and 2018

Data repository URLs:

Date of Submission: 

Friday, March 8, 2019

Date of Decision: 

Tuesday, March 19, 2019



1 Comment

Meta-Review by Editor

I thank the authors for having provided a revision of their paper that touched all the points raised by the reviewers.

After reading the revision and the way the authors have addressed the various points raised, I'm in favour to accept the article. However, there are two points that must be addressed in the camera ready which are crucial, from my perspective:

1. even if the authors said that the ontology they have used in the prototype is just a toy model, I would like to see it published somewhere on the Web, identified by a persistent URL (see for that), and accompanied by an appropriate (HTML?) documentation. Even if it is a toy ontology, it can be useful to the community, is part of the data presented in the article, and as such must be available. I suggest to use some tool, e.g. Widoco (, for creating the documentation directly from the annotations contained in the ontology itself.

2. I urge to see both **software** and the **exemplar data** that are included in the GitHub repository (which is a reasonable first step though) published in appropriate repositories so as they can have assigned a DOI and be cited in the paper – and be cited in the future as a proper bibliographic resource. In particular, the fact that the software is only a prototype is not a valid excuse for not assigning a DOI to it. Creating a release in GitHub does not "close" systematically the repository, but rather allow you to put a clear milestone of what was the code used in the specific article. Thus, I must ask to the authors to do create a GitHub release of their code so as to get a DOI from Zenodo, and then cite it properly in the paper. The same thing should be done with the samples data that have been made available online, by uploading them in Zenodo/Figshare, associating the appropriate metadata, so as to cite them as well in the article. Please include the references to the software and the data in the reference list of the article – more info at and

- page 8 line 22: issue in encoding at the end of the line

Have a nice day :-)

Silvio Peroni (