The Knowledge Graph as the Default Data Model for Machine Learning

Tracking #: 439-1419


Xander Wilcke
Peter Bloem
Victor de Boer

Responsible editor: 

Michel Dumontier

Submission Type: 

Position Paper


In modern machine learning, raw data is the preferred input for our models. Where a decade ago data scientists were still engineering features, manually picking out the details they thought salient, they now prefer the data as raw as possible. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate representations to sift out relevant features. In some areas, however, we struggle to find this raw form of data. One such area involves heterogeneous knowledge: entities, their attributes and internal relations. The Semantic Web community has invested decades of work on just this problem: how to represent knowledge, in various domains, in as raw and as usable a form as possible, satisfying many use cases. This work has led to the Linked Open Data Cloud, a vast and distributed knowledge graph. If we can develop methods that operate on this raw form of data - the knowledge graph - we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. In this position paper, we describe current research in this area and discuss some of the promises and challenges of this approach.



  • Reviewed

Data repository URLs: 


Date of Submission: 

Monday, April 10, 2017

Date of Decision: 

Tuesday, April 25, 2017



Solicited Reviews:

1 Comment

Meta-Review by Editor

The manuscript offers a vision for data science focused on knowledge graphs and machine learning. The reviewers found the vision to be stimulating, but additional argumentation is required to make a convincing case for the vision.

Michel Dumontier (