The sharing of scientific and scholarly data has been increasingly promoted over the last decade, leading to open repositories in many different scientific domains. However, data sharing and open data are not final goals by themselves, while the real benefit is in data reuse, which allows leveraging investments in research and enables large-scale data-driven research progresses.
Focusing on reuse, this paper discusses the design of an integrated framework to automatically take advantage of large amounts of scholarly scientific data to support research, and in particular scientific model development. Scientific models reproduce and predict complex phenomena and their development is a rather challenging task, within which scientific experiments have a key role in their continuous validation.
Starting from the chemical kinetics domain, this paper discusses a set of use cases and a first prototype for such a framework which lead to a set of functional requirements and an architecture that can easily be generalized to other domains. The paper analyzes the needs, the challenges and the research directions for such a framework, in particular those related to data management, automatic scientific model validation, data aggregation and data analysis, to leverage large amounts of scholarly data for new knowledge extraction.
Special issue (if applicable):
Special issue of Data Science, including a selection of extended papers from SAVE-SD 2017 and 2018