Reviewer has chosen not to be AnonymousOverall Impression:
AcceptTechnical Quality of the paper:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
The paper makes the case for expanding research in knowledge-based data science. In doing so, it gives a short introduction to knowledge representation, a compact, yet comprehensive classification of knowledge-based inference and -finally- lists open challenges.
Reasons to accept:
The paper highlights a somewhat underrated (in the opinion of the author) aspect of data science, which he calls "Knowledge-based Data Science".
Note that the paper does not make the case for (the success of) Knowledge-based Data Science by itself, but rather it calls for more research into this particular field. I think the latter is a valid position given the current state-of-the-art.
The paper gives a (to my knowledge) novel classification which also serves as a good overview of previous and current knowledge-based inference methods. It is supported by up-to-date references.
Much more can (and probably should) be said about this topic, but this paper provides a necessary and sufficient point of reference for further discussion.
Reasons to reject:
No major reasons to reject, but needs some refinement and clarification of terminology (see Further comments).
I think it would be beneficial to the paper if the author explicitly gives his definition of the concepts "(big) data", "knowledge" and "reasoning"/"inference" somewhere in the introduction. Presumably the stated examples of "ways in which a computational approach might act as if it knew something" are all different forms of reasoning? Is reasoning the same as "computational processing/manipulation of knowledge"?
There seems to be some unintended repetition or reordering of sentences in the transition between paragraph 1 and 2: "Much more contemporary ... entail ontological commitments", which makes it unclear how the older and the "much more contemporary work" are related.
"While the Semantic Web standards are intended.. knowledge". However I miss some acknowledgement of the trend(?) to capture primary research "data" as RDF, which may or may not support later generation of (new) knowledge. In the author's opinion, would that classify as just "(big) data"? Or is it an unintended use of the Semantic Web? Is this already covered by one of the types of inference (Section 2), or is it yet another type?
"As is clear from the NIH BD2K experience, .." A similar statement is also made in the introduction, but why is it clear? Is there (budgeting) data to support this?
Link to Final PDF and JATS/XML Files
Submitted by Tobias Kuhn on