The Knowledge Graph as the Default Data Model for Machine Learning

Tracking #: 465-1445

Responsible editor: 

Michel Dumontier

Submission Type: 

Position Paper


In modern machine learning, raw data is the preferred input for our models. Where a decade ago data scientists were still engineering features, manually picking out the details they thought salient, they now prefer the data in their raw form. As long as we can assume that all relevant and irrelevant information is present in the input data, we can design deep models that build up intermediate representations to sift out relevant features. However, these models are often domain specific and tailored to the task at hand, and therefore unsuited for learning on heterogeneous knowledge: information of different types and from different domains. If we can develop methods that operate on this form of knowledge, we can dispense with a great deal of ad-hoc feature engineering and train deep models end-to-end in many more domains. To accomplish this, we first need a data model capable of expressing heterogeneous knowledge naturally in various domains, in as usable a form as possible, and satisfying as many use cases as possible. In this position paper, we argue that the knowledge graph is a suitable candidate for this data model. This paper describes current research and discusses some of the promises and challenges of this approach.


Supplementary Files (optional): 

Previous Version: 


  • Reviewed

Data repository URLs: 


Date of Submission: 

Tuesday, May 16, 2017

Date of Decision: 

Tuesday, June 20, 2017

Nanopublication URLs:



Solicited Reviews:


Meta-Review by Editor

We are pleased to inform you that your paper has been accepted for publication, under the condition that you address the remaining minor issues.

The reviewers found that the revised manuscript largely addressed all of the points raised. In order to be suitable for publication, please address the following two aspects:

  • remove "default" data model from the title.
  • extend the discussion to further address contextual limitations of knowledge graphs.

Michel Dumontier (