OntoLDA: An Ontology-based Topic Model for Automatic Topic Labeling

Tracking #: 492-1472

Responsible editor: 

Victor de Boer

Submission Type: 

Research Paper


Topic models, which frequently represent topics as multinomial distributions over words, have been extensively used for discovering latent topics in text corpora. Topic labeling, which aims to assign meaningful labels for discovered topics, has recently gained significant attention. In this paper, we argue that the quality of topic labeling can be improved by considering ontology concepts rather than words alone, in contrast to previous works in this area, which usually represent topics via groups of words selected from topics. We have created: (1) a topic model that integrates ontological concepts with topic models in a single framework, where each topic is represented as a multinomial distribution over concepts and each concept is a multinomial distribution over words, and (2) a topic labeling method based on the ontological meaning of the concepts included in the discovered topics. In selecting the best topic labels, we rely on the semantic relatedness of the concepts and their ontological classifications. The results of our experiments conducted on two different data sets show that introducing ontological concepts as additional, richer features between topics and words and describing topics in terms of concepts offers an effective method for generating meaningful labels for the discovered topics.



  • Reviewed

Data repository URLs: 


Date of Submission: 

Friday, June 23, 2017

Date of Decision: 

Tuesday, July 18, 2017

Nanopublication URLs:



Solicited Reviews:


About extension guidelines

(This comment is based on a email-exchange, which I copy here for transparency).

After checking with the editors-in-chief, we came to the following preliminary guidelines (they will actually be made explicit on the submission page in the near future, so thanks for kickstarting this!)

  • Extensions of previous work in general are ok
  • The nature as an extended version must be clearly stated and a clarification on how this is an extension of previous work must be provided
  • There should be at least 25% new content
  • The new content should cover significant additional elements (not just explaining the same things with more words)

Based on this, preliminary guidelines were published on the datascience journal author page: 


Meta-Review by Editor

Your manuscript was carefully reviewed by two reviewers, who both point out that although this is an interesting paper and an interesting approach, the main problem is that the current version is too similar to the previously published paper "Automatic Topic Labeling using Ontology-based Topic Models". Based on this consensus, I decided not to wait for a third review to come in, but to decide on these two reviews.

Most journals, including Data Science accept extended versions of workshop- or conference papers, provided that a) the paper clearly states the nature as an extended version and a clarification on how this is an extension of previous work and b) there is a significant amount of new content and additional elements (not just explaining the same things with more words). After discussion in the editorial board, it was decided to make these guidelines explicit on the journal's information pages. The minimum amount of new content was determined at 25%.

I therefore would like to encourage you to resubmit this paper to this journal, if you are able to address these issues, extend the paper significantly and address the other issues mentioned in the reviews.

Victor de Boer (http://orcid.org/0000-0001-9079-039X)