The sharing of scientific and scholarly data has been increasingly promoted over the last decade, leading to open repositories in many different scientific domains. However, data sharing and open data are not final goals in themselves, the real benefit is in data reuse, which allows leveraging investments in research and enables large-scale data-driven research progress.
Focusing on reuse, this paper discusses the design of an integrated framework to automatically take advantage of large amounts of scientific data extracted from the literature to support research, and in particular scientific model development. Scientific models reproduce and predict complex phenomena and their development is a rather challenging task, within which scientific experiments have a key role in their continuous validation.
Starting from the combustion kinetics domain, this paper discusses a set of use cases and a first prototype for such a framework which leads to a set of new requirements and an architecture that can be generalized to other domains. The paper analyzes the needs, the challenges and the research directions for such a framework, in particular those related to data management, automatic scientific model validation, data aggregation and data analysis, to leverage large amounts of published scientific data for new knowledge extraction.