The Knowledge Graph as the Default Data Model for Machine Learning

Tracking #: 439-1419

Authors:

	Name	ORCID
	Xander Wilcke	https://orcid.org/0000-0003-2415-8438
	Peter Bloem	https://orcid.org/0000-0002-0189-5817
	Victor de Boer	https://orcid.org/0000-0001-9079-039X

Responsible editor:

Michel Dumontier

Submission Type:

Position Paper

Abstract:

In modern machine learning, raw data is the preferred input for our models. Where a decade ago data scientists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate representations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has invested decades of work on just this problem: how to represent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data - the knowledge graph - we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.

Manuscript:

ds-paper-439.pdf

Revised Version:

The Knowledge Graph as the Default Data Model for Machine Learning

Data repository URLs:

none

Date of Submission:

Monday, April 10, 2017

Date of Decision:

Tuesday, April 25, 2017

Nanopublication URLs:

Decision:

Undecided

Solicited Reviews:

Review #1 submitted on 13/Apr/2017

By Robert Hoehndorf ORCID logo

https://orcid.org/0000-0001-8149-5890

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Clear novelty
Data availability: With exceptions that are admissible according to the data availability guidelines, all used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The manuscript titled "The Knowledge Graph as the Default Data Model
for Machine Learning" describes a vision for data science in which all
information is generally represented in the form of knowledge graphs,
and machine learning algorithms are built that specifically utilize
information in such knowledge graphs. The authors focus mainly on
learning from structured information but also highlight the challenges
of utilizing unstructured information (e.g., string values) contained
within knowledge graphs for representation/feature learning.

Reasons to accept:

The manuscript is very timely and fits very well with the scope of the
journal and current research directions on machine learning with
knowledge graphs. The authors highlight the state of the art in
representation learning on knowledge graphs, including recent work on
applying graph convolution to knowledge graphs, and discuss several
examples that highlight the challenges that learning from knowledge
graph can address. I particularly like the discussion of what graph
convolution can bring to learning with knowledge graphs, i.e.,
end-to-end learning, something that is not easily achievable with many
other methods. It would be good to expand on this.

Reasons to reject:

The manuscript, while addressing an important topic, is not very
convincing in many points; it also overstates several issues and lacks
support for many of its claims.

The manuscript does not distinguish between "data" and "knowledge",
although this distinction is important to make. The authors switch
between both as if these were the same, and I don't think the authors
would like to make this claim. Knowledge graphs, as currently used, do
not contain "raw" data (as the authors seem to claim); instead, they
contain facts and conceptual knowledge (or background knowledge). The
authors seem to be aware of this, but conflate both notions
frequently. Would a knowledge graph, for example, be useful to
represent image data in "raw" form? While it may, in principle, be
possible to represent an image in raw form as RDF (for example, by
representing each pixel as entity and linking to it's RGB value and
coordinates), it is very unlikely that this would be a useful
representation (not even mentioning the resource requirements of storing
image data or other large datasets as RDF). The same holds true for
many kinds of data, such as videos, audio, genomic sequences, etc.,
which are not only too large to represent as RDF but also simply not
suitable for such a representation. This also casts doubts on the
position taken in the paper (demonstrated in the title) that knowledge
graphs should be the default data model for (all) machine learning.

Much more likely, there will be machine learning algorithms that take
the "raw" data (such as images, videos, etc.), extract knowledge from
those and represent them as facts (in knowledge graphs, or RDF), and
learning on knowledge graphs can then combine this information with
background knowledge, other facts, etc. The authors miss the
opportunity to make this point here. There is also a great potential
to link machine learning models on "raw" data (images, etc.) to models
that can learn on knowledge graphs in an end-to-end way, which is something the authors
seem to be aware of (in the discussion of graph convolution which can
be directly connected to other models in an end-to-end fashion), but
fail to mention or discuss.

The authors focus a lot of attention on the fact that knowledge graphs
can replace feature extraction. This should be stated more precisely;
identifying which information to include in a knowledge graph can
still be considered to be a form of feature selection, and it still
requires data conversion and possibly also some form of "extraction"
(e.g., discretizing continuous values, or simply representing any
information as facts). Even in the example they authors give about
emails, the structure of an email (subject, data, recipients, cc,
etc.) must first be mapped to a knowledge graph before any knowledge
graph machine learning methods can be applied. This mapping can be
seen as a form of feature extraction, and in less structured datasets,
more work is required to perform this mapping.

The authors also claim that feature extraction removes
information. While this can be true, the authors neglect the
possibility that feature extraction can "add" information, notably the
background information the researcher has but that is not available in
computational form, or accessible through a knowledge graph.

I don't understand what the authors mean by saying the disjointness or
subclass axioms are "recursive" properties. In general, the relation
between a heterogeneous graph (a graph with both edge and node labels)
and a "knowledge graph" (that may have additional explicit semantics
and gives rise to inferences) is not clear in the manuscript.

The authors also represent a rather simple version of how the Semantic
Web can lead to integration; simply reusing IRIs does not on its own
lead to "integration", due to different data models, different
semantics, etc. IRIs also do not have to be dereferencable, and it is
not clear whether dereferencing an IRI (if that is possible) can lead
to retrieving "data" about the entity (usually we just get a website).

I do not understand how the Open World Assumption resolves the problem
of incomplete knowledge. Would an algorithm that learns on knowledge
graphs not also have to solve the problem of missing data, using
imputation methods? Is this not just "hiding" the imputation problem?

Distance in a knowledge graph does not always reflect similarity;
imagine a "disjoint with" edge in the knowledge graph, which should
increase dis-similarity instead of reflecting similarity.

It would be better if Section 5 would present the different approaches
to learning on knowledge graphs in a more related way. For example,
embeddings, tensor factorization, or RDF2Vec, are represented as very
different concepts, but they are quite related (embeddings can, for
example, be generated through tensor factorization). This relation
should be discussed instead of just listing these approaches.

Page 10 misses a figure and figure reference ("The GCN model (Figure
??)...").

The difference between the GCN model and RDF2Vec states that GCN will
have to maintain the whole graph in memory, while RDF2Vec "need to
perform random walks on the graph". Please state why this does not
require keeping the whole graph in memory; why would a random walk not
require random access to potentially any node in the graph?

The conclusions would be better stated as the conclusions of a
position, not as conclusions to research results (as it currently reads). I do not believe the
authors have "shown" the benefits of knowledge graphs in their paper;
they outline a vision/position in which heterogeneous structured
knowledge is represented as a knowledge graph and machine learning
algorithms are built to use knowledge graphs as input.

Nanopublication comments:

Further comments:

Review #2 submitted on 21/Apr/2017

By Kody Moodley ORCID logo

https://orcid.org/0000-0001-5666-1658

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper argues the view that the knowledge graph (linked open data) should be used as the default data model for machine learning. A number of simple learning use-cases are used to motivate this view.

The first two sections of the paper introduce the concept of knowledge graphs as a representation format for data in general, and its inherent advantages. The third section illustrates, with the use of examples, how the representational formalism of knowledge graphs circumvent some problematic issues in machine learning, in particular the need for feature engineering and the biases that come with it.

Section four represents a summary of the main challenges to using knowledge graphs as the default data model for machine learning and section five gives an overview of some of the ways in which current machine learning approaches consume knowledge graphs.

Reasons to accept:

The paper is very convincing in its argument for knowledge graphs as a machine learning data model. The linkage of related datasets in the knowledge is especially attractive because it associates, by default, more related features to the learning algorithm that would otherwise have to be discovered manually through feature engineering in a raw data setting.

The examples are simple and clear and they appropriately demonstrate the advantages of knowledge graphs as machine learning data model. The paper is well structured and well written in general.

Reasons to reject:

While the argument for knowledge graphs is well executed, the last third of the paper falls away slightly in its quality. Section 5 in particular mostly gives some technical details for current approaches for consuming knowledge graphs. There is no great detail in the shortcomings of these approaches or where improvement is needed. There needs to be more clarity in this regard.

Furthermore, while I was convinced by the cleaner and more structured representation format of knowledge graphs, as well as the associated reasoning properties, I couldn't help but wonder why there is no mention of evaluations for knowledge graph approaches to machine learning. Why are there no evaluations mentioned to test the accuracy of predictions made with these approaches? If there aren't any substantial evaluations or comparisons then why aren't there any?

Nanopublication comments:

Further comments:

While there are some shortcomings in the paper, the positives for me outweigh the negatives just enough for acceptance.

Minor suggestions / issues:

Page 1 - perhaps deep learning and feature engineering can be given some citations or short definitions?
Page 4 - "...disjoint with the class Animals..." -> "disjoint with the class Animal..."?
Page 4 - Integrated, dereferencable and disambiguated knowledge -> would have been nice to have quick examples to illustrate these concepts (space permitting)
Page 7 - "...use one of the many available imputation methods..." -> citations?
Page 7 - "...can become problematic when having to deal with a large number of them..." -> citations / example?
Page 7 - "...often by a factor of ten..." -> citation / example?
Page 7 - "... Knowledge graphs follow the OWA..." -> Perhaps a simpler definition for OWA is that missing information is not assumed to be false?
Page 7, Section 4.2 - "...represent explicit statements as well as background knowledge that be used to derive implicit statements..." -> We can also derive implicit background knowledge using reasoning engines. E.g. explicit statement: john is a man, background knowledge: man is a human, human has two legs. We can derive the implicit statement: john has two legs, but we can also derive implicit background knowledge: man has two legs
Page 8 - "...we have no way knowing of how..." -> "...we have no way OF knowing how..."
Page 8 - "...As an example, consider once more use use..." -> "As an example, consider once more the use..."
Page 8, Section 4.4 - "...This in contrast to non-literal resources, here their local neighbourhood is the 'value'..." -> Rewrite
Page 8, Section 4.5 - "...countless of subgraphs..." -> "...countless subgraphs..."
Page 8, Section 5 - "...community began..." -> "...community begun..."
Page 9, Section 5.1 - "...Tensors are the generalization..." -> "...A tensor is the generalization of a matrix into..." OR "...Tensors are generalizationS of matriCes into..."
Page 9, Section 5.1 - "this tensor model as layer" -> "this tensor model as A layer"
Page 9, Section 5.2.1 - "they can equally well be" -> "they can be equally well"
Page 10 - "which generates features vectors" -> "which generates feature vectors"
Page 10 - "and then proceed to used" -> "and then proceed to use"
Page 10 - "The GCN model..." -> (Figure number missing)

Review #3 submitted on 25/Apr/2017

By Heiko Paulheim ORCID logo

https://orcid.org/0000-0003-4386-8195

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Weak
Presentation: Average
Reviewer`s confidence: Medium
Significance: High significance
Background: Reasonable
Novelty: Clear novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This position paper presents the vision of a paradigm shift in machine learning: stepping from propositional data to (knowledge) graphs as a more universal data representation, and developing end-to-end learning approaches that do not follow the currently dominant pattern of first converting the graph into a propositional form and then applying standard machine learning on top.

Reasons to accept:

* interesting vision
* probably stimulating some interesting thoughts for other researchers

Reasons to reject:

* argumentation not always convincing
* slight over-selling

Nanopublication comments:

Further comments:

In general, being a knowledge graph enthusiast myself, I like the idea of the paper. However, I miss a stronger argumentation towards knowledge graphs, as they are just one paradigm of how to represent data, and enthusiasts of other paradigms might have other feelings. In general, the whole paper would also work if "knowledge graph" would be represented with "relational database" (i.e., developing an end-to-end system that operates on a whole relational database including its schema, not just a single propositional table), or with "XML", or with "NoSQL database". Thus, the paper would benefit from a stronger argumentation why knowledge graphs are supposed to be the superior paradigm.

On the detail level, there are a few questionable claims. The example in Fig. 1 is introduced as if it was a standard model (although section 4.3 relativizes that interpretation), but there are a few options here. For example, the birthdate could also have been expressed using the W3C Time Ontology [1], the age could be an observation with a timestamp instead of a simple literal, etc.

The claim about the open world assumption as an inherent means to describe incomplete knowledge is also questionable. Technically, I do not see a difference to a database with a null value. Since the paper states that an end-to-end-system for learning with knowledge graphs still needs to be developed, which can also be stated about databases with null values. Furthermore, the claim that the number of entities with a missing property is ten times higher than the number of entities with a property value needs a source.

In section 2.2, the expression of background knowledge in an ontology and the use of reasoning is discussed. This is somewhat contradicting to the end-to-end idea: if a reasoner is used a preprocessing step, it is impossible to get back before that step, e.g., for parameter tuning (such as: changing the set of inference rules). Furthermore, when regarding the deep learning paradigm: why not feed the learning algorithm with the knowledge graph and the ontology and let it figure out the appropriate reasoning itself (see, e.g., [2]). Likewise, in section 4.2, it is claimed that machine learning algorithms cannot exploit subclass hierarchies. This is, to the best of my knowledge, not true, as there are both hierarchical classification approaches [3] as well as learning approaches that can handle hierarchies in features (e.g., [4]).

The discussion of text literals as a challenge is good, but it is not the only challenge. Data is not just values and literals, there is also media data. A universal data representation, as proposed in this paper, should also treat those as first class citizens; the same holds for streaming data. Without such aspects discussed, the claim that knowledge graphs can be *the* data model for machine learning is a bit too bold.

Overall, I like the idea of the paper, and the vision presented. However, the discussion is not thorough enough and requires a bit more maturity before publication.

Minor observation:
* in section 5.3, there is a reference to a missing figure

[1] https://www.w3.org/TR/owl-time/
[2] Paulheim and Stuckenschmidt: Fast approximate A-box consistency checking using machine learning (2016)
[3] Silla Jr, Carlos N., and Alex A. Freitas. "A survey of hierarchical classification across different application domains." Data Mining and Knowledge Discovery 22.1-2 (2011): 31-72.
[4] Arnold, Andrew, Ramesh Nallapati, and William W. Cohen. "Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition." ACL. 2008.

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Thu, 06/22/2017 - 01:28

The manuscript offers a vision for data science focused on knowledge graphs and machine learning. The reviewers found the vision to be stimulating, but additional argumentation is required to make a convincing case for the vision.

Michel Dumontier (http://orcid.org/0000-0003-4727-9435)

Data Science

The Knowledge Graph as the Default Data Model for Machine Learning

Tracking #: 439-1419

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor