Scholarly data analysis to aid scientific model development

Tracking #: 549-1529

Responsible editor: 

Silvio Peroni

Submission Type: 

Research Paper


The sharing of scientific and scholarly data has been increasingly promoted over the last decade, leading to open repositories in many different scientific domains. However, data sharing and open data are not final goals by themselves, while the real benefit is in data reuse, which allows leveraging investments in research and enables large-scale data-driven research progresses. Focusing on reuse, this paper discusses the design of an integrated framework to automatically take advantage of large amounts of scholarly scientific data to support research, and in particular scientific model development. Scientific models reproduce and predict complex phenomena and their development is a rather challenging task, within which scientific experiments have a key role in their continuous validation. Starting from the chemical kinetics domain, this paper discusses a set of use cases and a first prototype for such a framework which lead to a set of functional requirements and an architecture that can easily be generalized to other domains. The paper analyzes the needs, the challenges and the research directions for such a framework, in particular those related to data management, automatic scientific model validation, data aggregation and data analysis, to leverage large amounts of scholarly data for new knowledge extraction.



  • Reviewed

Special issue (if applicable): 

Special issue of Data Science, including a selection of extended papers from SAVE-SD 2017 and 2018

Data repository URLs: 


Date of Submission: 

Sunday, December 16, 2018

Date of Decision: 

Wednesday, February 6, 2019



Solicited Reviews:

1 Comment

Meta-Review by Editor

We have received three complete and interesting reviews that should help you in expanding the paper and fixing the issues identified in the paper. I ask you to consider all the concerns raised by all the reviewers, since they are all very sensitive and important.A major concern, which has been raised by all, is the lack of availability of the data and of the software described in the paper, which is not acceptable for a publication in Data Science. I would strongly suggest to make them available online, following the FAIR principles and also the guidelines used in the ISWC Resource Track (the most recent ones are available at For the latter ones, see in particular the section related to availability. Data (e.g. by publishing them in Figshare or Zenodo) and software (e.g. by using the GitHub+Zenodo feature for assigning DOIs to code) should be appropriately cited in the paper, see and for extensive discussion on the topic.

Silvio Peroni (