The Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge

Tracking #: 494-1474

	Name	ORCID
	Xander Wilcke	https://orcid.org/0000-0003-2415-8438
	Peter Bloem	https://orcid.org/0000-0002-0189-5817
	Victor de Boer	https://orcid.org/0000-0001-9079-039X

Authors:

Submission Type:

Position Paper

Abstract:

In modern machine learning, raw data is the preferred input for our models. Where a decade ago data scientists were still engineering features, manually picking out the details we thought salient, they now prefer the data in their raw form. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate representations to sift out relevant features. However, these models are often domain specific and tailored to the task at hand, and therefore unsuited for learning on heterogeneous knowledge: information of different types and from different domains. If we can develop methods that operate on this form of knowledge, we can dispense with a great deal more ad-hoc feature engineering and train deep models end-to-end in many more domains. To accomplish this, we first need a data model capable of expressing heterogeneous knowledge naturally in various domains, in as usable a form as possible, and satisfying as many use cases as possible. In this position paper, we argue that the knowledge graph is a suitable candidate for this data model. We further describe current research and discuss some of the promises and challenges of this approach.

Manuscript:

ds-paper-494.pdf

Supplementary Files (optional):

ds-supplementary-494-757.zip

Previous Version:

The Knowledge Graph as the Default Data Model for Machine Learning

Data repository URLs:

none

Date of Submission:

Thursday, June 29, 2017

Nanopublication URLs:

RESPONSE TO REVIEWERS

Title: The Knowledge Graph as the Default Data Model for Machine Learning on Heterogeneous Knowledge
Authors: Xander Wilcke (*), Peter Bloem, and Victor de Boer
Journal: Data Science
Year: 2017
Type: Position paper
Version: Final v3

Concerns: cover letter concerning the points raised by three reviewers (2nd review round).

================================================

We would like to thank the three reviewers for their careful and useful reviews. Based on the points raised by each of
the reviewers we have updated our paper. For each point, we first list the main issue (>>>ISSUE: ) followed by our
response and adaptations (

================================================

>>>ISSUE:
Editor: "Remove the word 'default' from the title"
Reviewer 1: "The title might have been too optimistic. [...] to claim that it should be the default data model is perhaps too strong a statement."
Reviewer 2: "The title is too broad and does not accurately reflect the position in the paper."

We agree with the concerns with the original title. However, a title without the word 'default' would, we feel, reduce the title to a vacuous truth: it will be no surprise to anyone that one can perform machine learning on knowledge graphs. We specifically argue for a _broader_ use of knowledge graphs in machine learning, and the title should reflect that. Our solution is as follows:
- We have included "heterogeneous knowledge" in the title to make it more specific to our message
- We have included a paragraph in the introduction that defines what we mean by _default_: not a solution for all use cases, but a good first choice.

We realize that we are explicitly ignoring the editor's request in this draft. If this title is not acceptable we suggest the following instead:
"On the Knowledge Graph as the Data Model for Learning on Heterogeneous Knowledge"

------------------------------------------------

>>>ISSUE
Editor: "Extend the discussion to further address contextual limitations of knowledge graphs."
Reviewer 3: "[...] more discussion on when to use a knowledge graph and when not to [...]"

Our position is that for the majority of use-cases, knowledge graphs can be used, with two important caveats:
- The granularity with which knowledge is encoded into a graph must be carefully chosen
- The benefits of knowledge graphs are only obvious when dealing with heterogeneous knowledge. For homogeneous knowledge there may no performance benefits, but no disadvantages either.

We have adapted the introduction and section 3.4 to highlight these caveats.

------------------------------------------------

>>>ISSUE
Reviewer 3: "[...] more thorough discussion of the limitations w.r.t. the challenges. [...] distinguish what current approaches [...] are already capable of doing, what they might be extended to, and what might be the hard challenges for which no straightforward solution exists.[...]"

We have added a subsection (5.4) with a brief description of how current methods address the challenges described in section 4. This section also provides some indication of which challenges are simple to overcome and which are more fundamental.

------------------------------------------------

>>>ISSUE
Reviewer 3: "[...] I have my doubts that simply using a deep neural net will solve the issues. For example, for data with different modeling paradigms [...]. Without a significant overlap of pairs of instances that use *both* properties simultaneously (also indirectly by interlinking instances in both datasets), it will be difficult to learn that they refer to the same property.[...]"

We agree that some challenges, specifically the issue of differently modeled knowledge, are deep problems, and may in some cases be insurmountable. We have made this more explicit in section 4.4 and referenced active learning as a potential middle ground between integrating data by hand, and learning the integration as necessary with end-to-end models. Crucially, such solutions are still helped by a simple low-level integration of data from different sources, which the knowledge graph model provides.

================================================