Knowledge-based Data Science

Tracking #: 419-1399

Authors:

	Name	ORCID
	Lawrence Hunter	https://orcid.org/0000-0003-1455-3370

Responsible editor:

Tobias Kuhn

Submission Type:

Position Paper

Abstract:

Computational manipulation of knowledge is an important, and often underappreciated, aspect of biomedical Data Science. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application, concluding with the key questions that will drive knowledge-based data science into the mainstream of biomedical research.

Manuscript:

ds-paper-419.pdf

Supplementary Files (optional):

ds-supplementary-419.zip

Data repository URLs:

none

Date of Submission:

Thursday, February 16, 2017

Date of Decision:

Wednesday, March 1, 2017

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 24/Feb/2017

By Mark Thompson ORCID logo

https://orcid.org/0000-0002-7633-1442

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: High significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper makes the case for expanding research in knowledge-based data science. In doing so, it gives a short introduction to knowledge representation, a compact, yet comprehensive classification of knowledge-based inference and -finally- lists open challenges.

Reasons to accept:

The paper highlights a somewhat underrated (in the opinion of the author) aspect of data science, which he calls "Knowledge-based Data Science".

Note that the paper does not make the case for (the success of) Knowledge-based Data Science by itself, but rather it calls for more research into this particular field. I think the latter is a valid position given the current state-of-the-art.

The paper gives a (to my knowledge) novel classification which also serves as a good overview of previous and current knowledge-based inference methods. It is supported by up-to-date references.

Much more can (and probably should) be said about this topic, but this paper provides a necessary and sufficient point of reference for further discussion.

Reasons to reject:

No major reasons to reject, but needs some refinement and clarification of terminology (see Further comments).

Nanopublication comments:

Further comments:

Introduction:
I think it would be beneficial to the paper if the author explicitly gives his definition of the concepts "(big) data", "knowledge" and "reasoning"/"inference" somewhere in the introduction. Presumably the stated examples of "ways in which a computational approach might act as if it knew something" are all different forms of reasoning? Is reasoning the same as "computational processing/manipulation of knowledge"?

Section 1:
There seems to be some unintended repetition or reordering of sentences in the transition between paragraph 1 and 2: "Much more contemporary ... entail ontological commitments", which makes it unclear how the older and the "much more contemporary work" are related.

Section 1:
"While the Semantic Web standards are intended.. knowledge". However I miss some acknowledgement of the trend(?) to capture primary research "data" as RDF, which may or may not support later generation of (new) knowledge. In the author's opinion, would that classify as just "(big) data"? Or is it an unintended use of the Semantic Web? Is this already covered by one of the types of inference (Section 2), or is it yet another type?

Section 3:
"As is clear from the NIH BD2K experience, .." A similar statement is also made in the introduction, but why is it clear? Is there (budgeting) data to support this?

Review #2 submitted on 24/Feb/2017

By Pascale Gaudet ORCID logo

https://orcid.org/0000-0003-1813-6857

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Excellent
Presentation: Excellent
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper describes the current progress in knowledge representation for the data sciences. Its different sections focus on knowledge representation, inference, hypothesis generation using computational representation of knowledge. The final section describes open challenges in the domain.

Reasons to accept:

This paper gives a useful overview of the important domain of knowledge representation in biological research, and will be very informative to the readership of Data Sciences.

Reasons to reject:

none

Nanopublication comments:

Further comments:

I noticed some small typos:
1. Uniprot -> UniProt
2. e.g.. -Y remove extra period

Review #3 submitted on 28/Feb/2017

By Juan Banda ORCID logo

https://orcid.org/0000-0001-8499-824X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Excellent
Suggested Decision: Accept
Technical Quality of the paper: Excellent
Presentation: Excellent
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Clear novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

In this position paper, the author presents his view on the current state knowledge-based data science and most importantly, the shortcomings which need attention moving forward. The paper nicely provides a brief outline about the representation of biomedical knowledge, from ontologies to using the semantic web. The author then moves to frame the state of knowledge-based inference while providing context for logical inference, inference from ontology annotation and biomedical literature, as well as hypothesis generation. After framing the current state of knowledge-based data science in the biomedicine context, the author then provides very insightful opinions regarding the open challenges in the field.

Reasons to accept:

The paper is very well outlined, it provides a brief window of the relevant literature for the new practitioner to get a grasp on the state of the field. It also provides very concise insight into the current issues and major directions of the state of the field for existing researchers. The open questions posed in this paper are very interesting and relevant potential research directions that should be clear to people entering the field or looking for new problems to address.

Reasons to reject:

Nanopublication comments:

Further comments:

1 Comment

Link to Final PDF and JATS/XML Files

Submitted by Tobias Kuhn on Wed, 07/04/2018 - 08:36

https://github.com/data-science-hub/data/tree/master/publications/1-1-2/ds-1-1-2-ds001

Tracking #: 419-1399

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Supplementary Files (optional):

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment